HTML Basics

The Hypertext Markup Language (HTML), developed in the 1990s, is a language for transmitting global hypertext documents. HTML is a "markup language" that defines the structure of a document using information such as tags and text. A web browser reads this information and displays a webpage. In this article, we look at a brief introduction to HTML. Then, we will focus on the structure of an HTML document, mandatory and optional tags, and attributes. As you read these sections, pay attention to the rules for writing HTML. For example, an HTML5 document must contain four basic tags enclosed in open "<" and closed ">" brackets.

HTML Text, Tags, and Attributes

HTML was derived from Standard Generalized Markup Language (SGML). SGML was designed as a markup language, to allow a writer to markup (or annotate) a document. Markup languages have been around since at least the 1970s, when the author used one on a DECSYSTEM-10 to format school papers. Perhaps the most popular pure markup language still in use is LaTeX, which is used for mathematical, scientific, and engineering documents.

The idea behind a markup language is that a document could be marked-up with tags to tell a program processing the input how to render the text. For example, the following HTML code:

</center>The <i>quick</i> brown fox jumped over the <strong>lazy</strong>  dog</center>

would be rendered as:

The quick brown fox jumped over the lazy dog

Markup languages were the precursors of word processing programs that became popular in the 1980s with the PC revolution. The word processing programs used a What-You-See-Is-What-You-Get (WYSIWYG) interface, which is much easier for a novice computer user to interface with than a markup language. WYSIWYG editors eventually took over the market for word processing, with a few exceptions, such as LaTeX, as mentioned earlier.

SGML was originally a traditional markup language, and hyperlinks between documents were added to create HTML. In the beginning, the purpose was to link physics papers together in a web of documents. HTML started with a browser introduced at CERN in 1990. HTML has expanded far beyond the wildest vision of its creators, but still maintains its markup character.

HTML still consists of text and tags. Over time, HTML has evolved from its roots and is no longer seen simply as a way to format a document. The HTML language is now used to define the content (or contextual meaning) of items on a web page, and the tags have evolved to represent this new role. Most of the original tags specify how to format text, such as bolding (<b>), centering (<center>) or italicizing (<i>), are now considered obsolete and their use is discouraged. Bolding is now done by the content tag strong (<strong>), and italicizing is done by the content tag emphasis (<em>).     Formatting is done based on the content tags using CSS, and interactivity is defined using embedded JavaScript programming using the <script> tag. HTML has become much more than a simple markup language, but to understand HTML, it is important to understand its roots as a markup language.

The tags in HTML are keywords defined between a less-than sign (<) and a greater-than sign (>), though when using HTML, it is more common to call them angle brackets. Between the angle brackets are HTML tags and attributes. For example, the HTML tag to bold text is the word strong, so to bold text the <strong> tag would be used. The tag represents actions to be taken by the program that processes the marked-up text.

HTML tags are not case sensitive, so the tags <i> and <I> are equivalent.

Most tags are applied to a block of text and apply to the block of the text they enclose. All tags are closed using a slash (/tag), as in the example above where <strong>lazy</strong> caused the word lazy to be bolded.

The tags shown above are called block tags in that the first tag (e.g. <i>) specifies where to begin italicizing the text, and the closing tag (e.g. </i>) specifies where to stop italicizing the text. The text or other information between the two tags is called a block.

Sometimes a tag, such as a break tag (<br>) or image tag (<image>) are empty, in the sense that they simply run a command, and do not apply an attribute to the text. In the case of the <br> tag, the meaning is to simply skip a line, so it does not affect any text or any other element. It could be written as <br></br>. However, HTML provides a shortcut for this type of tag. The tag can be closed between the angle brackets that opened it. The <br> tag can be written as <br/>.

As the br tag shows, tags can be used to do many things in HTML other than just markup text. For example, tags can be used to tell the HTML processor to include a picture. If a picture exists in the same directory as the web page, the image can be included on the page by adding the following tag into the HTML for the page:

<image src="dog.jpg" />

Program 1 = Image Tag

This line of code includes the picture from the file dog.jpg in the web page. The tag is the image tag, but the image needs an attribute to indicate where to find the picture. For the image tag, the attribute used to find the picture is the src tag.

Attributes are data that fill in details needed to implement the desired behavior for the tag. All tags can have some attributes, and these will be looked at in more detail later.