Overview of HTML

Really Bare Bones HTML

This isn't really a tutorial, per se. But if you are going to be doing web design at all, you do need to know a little bit about HTML. So, what follows is a quick overview of what HTML really is.

Everyone who has surfed the web has seen these letters before. And usually, they even know that HTML is what lets them see the web in all its glory. But what is it, really?

HTML stands for HyperText Markup Language. Ok, now that I've told you what the letters mean, that explains everything, right? So maybe I don't have to write the rest of the article… Oh, wait, you mean, it doesn't? Oh…ok, let's break it down.

Hypertext was a term originated by Ted Nelson in the mid-1960's. It was meant to describe a collection of documents which contain cross-references (also known as hypertext links, or just links) whose interconnection would be difficult to present or represent on paper.

A lot of writers - especially web developers - tend to go into a long discussion of the history of HTML at this point. From what I understand, that history is not what most people want to know. But, it does give a good basis for understanding what HTML is all about. So, let me sum it up quickly.

HTML is an outgrowth of Standard Generalized Markup Language (SGML), a computer programming dialect that was used in the latter portion of the 1980's to help prepare serious documents. It essentially defines nearly every known way of identifying a piece of text. For example, the group of words this sentence belongs to is easily defined by most people as a paragraph - and that is exactly how it is identified in SGML. SGML does this by way of a Document Type Definition (DTD). A DTD is simply a document that has been set up to spell out the rules for each identifier within the language.

However, SGML was impractical, primarily because it was such a complete definition of all the ways that pieces of information could be described. Therefore, HTML was developed in order to have a simpler way of identifying (also known as "Marking Up" - see where I'm going with this?) each piece of the document so that it could be seen on a variety of devices - some of which were not compatible with each other. It follows the same rules as SGML does, and uses a simpler form of SGML's DTD.

And finally, Language. Well, this should be pretty simple - HTML is a computer dialect, just like any other code, therefore, it is called a "language."

So, simply put, HTML is a language that identifies each piece of information in a manner easily linked to other pieces of information.

Put that way, it becomes less intimidating, doesn't it?

But, why do we need to define each piece of information? Well, with every business and every individual who has a computer, you have different choices being made. Not everyone has a copy of the same exact word processing program or spreadsheet program. Also there were (and continue to be) so many major differences between computer systems and platforms that most programs could not be understood across them all.

HTML stepped into this gap, and with the help of a few browsers, became a language that could be understood across multiple systems.

When you surf to an HTML page, your browser takes a look at the identifiers (called tags) for each piece of information, and because it knows the rules of HTML, knows how to show all of the information found on that page in a relatively consistent manner. It does this because all of the browsers have a copy of the HTML DTD embedded in the browser itself - unfortunately, that is part of why the browsers are so bloated, and require so much of a computer's processing capabilities.

Ok, now that we understand that, how does it actually work?

Well, I'm glad you asked. Each piece of information in a document is called an 'element'. Elements consist of an opening tag, content and a closing tag. Opening tags are fairly easy to understand, they are the actual identifier for the description of the element. So, using our paragraph example from before, each paragraph in an HTML page would start with a <p> tag. Closing tags are not much harder to understand. They are the same as the opening tag, but have a slash to define them as a closing tag. Therefore, at the end of each paragraph on our page, we would have a </p> tag. In many ways, this is like the WordPerfect word processing program. If you have ever used that program - or if you have a copy of it - there is a window that allows you to see the code behind what you are typing. In this window, for example, when you bold a section of the text, it places a "start bold" and "end bold" note for itself. This is exactly the same thing you are doing with HTML.

Each tag can also possess one or more attributes. Attributes are simply additional descriptors that state how each piece of information should be displayed. For example, our <p> tag has an align attribute, that allows us to display our paragraph as left justified, right justified or centered.

HTML defines document presentation as well as document structure. There are three kinds of markup in HTML: Structural, Stylistic and Descriptive. Structural markup can be defined as tags that lay out the structure of the document. This would include tags like paragraph, heading, or table. Stylistic markup is used to tell the browser how to display the information. This type includes such tags as bold, italic or underlined. Descriptive tags are fairly easy to understand since they are tags that describe the nature of their content. These tags include address, abbreviation, or acronym.

However, because HTML is so simple, it has lent itself to customization via the browsers, simply because of the fact that they have had the DTD embedded in them. That embedding has allowed the different browsers to make their own changes to the HTML DTD - and is why most web designers get very frustrated by the differences in how browsers read the HTML. Most HTML you see today isn't 'true' HTML because in many cases it doesn't conform to all of HTML's rules. The browser wars have allowed for this slackness, and ignore much of this rule breaking. In fact, they have encouraged it by their own customization of the HTML DTD.

However, the Web is expanding out beyond the browsers now, and HTML just can't keep up. All the new web-enabled devices such as cell phones, PDAs, and even some cars, just don't have the processing power to be able to read these pages.

I know you are probably asking why do we still need to learn about HTML then? Well, having a good basic knowledge of HTML will help you to understand how to deal with some of the newer technologies, such as XHTML, XML and CSS.

HTML has been the foundation that we needed for a truly global Internet. However, as we shape the Internet of the future, we can't allow it to limit us. We need to build on it, expand on it, and use it as our springboard into the bright future we imagine.