HTML Basics

Site: Saylor Academy
Course: PRDV401: Introduction to JavaScript I
Book: HTML Basics
Printed by: Guest user
Date: Friday, May 17, 2024, 2:31 AM

Description

The Hypertext Markup Language (HTML), developed in the 1990s, is a language for transmitting global hypertext documents. HTML is a "markup language" that defines the structure of a document using information such as tags and text. A web browser reads this information and displays a webpage. In this article, we look at a brief introduction to HTML. Then, we will focus on the structure of an HTML document, mandatory and optional tags, and attributes. As you read these sections, pay attention to the rules for writing HTML. For example, an HTML5 document must contain four basic tags enclosed in open "<" and closed ">" brackets.

HTML

This chapter is designed as a brief overview of HTML. HTML is the language used to markup (or layout) Web pages. It consists of tags that are embedded in strings of text. These tags are instructions in a web page to control things such as formatting. For example, the emphasis (em) tag is used to provide emphasis to a string, and the strong (strong) tag is used to bold text.

The evolution of HTML has caused it to be much more than a program that can format documents. It can be used to include information for other languages. For example, the script tag can be used to include JavaScript source code within the current document, and the style tag can be used to include an external file containing Cascading Style Sheets (CSS). It can be integrated with these other languages to then be used as an infrastructure for writing complex programs, such as form-based systems, mapping systems, and other useful programs that can be run from a browser.


Source: Charles W. Kann III, https://www.oercommons.org/courses/programming-for-the-web-from-soup-to-nuts-implementing-a-complete-gis-web-page-using-html5-css-javascript-node-js-mongodb-and-open-layers/view
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

HTML Text, Tags, and Attributes

HTML was derived from Standard Generalized Markup Language (SGML). SGML was designed as a markup language, to allow a writer to markup (or annotate) a document. Markup languages have been around since at least the 1970s, when the author used one on a DECSYSTEM-10 to format school papers. Perhaps the most popular pure markup language still in use is LaTeX, which is used for mathematical, scientific, and engineering documents.

The idea behind a markup language is that a document could be marked-up with tags to tell a program processing the input how to render the text. For example, the following HTML code:

</center>The <i>quick</i> brown fox jumped over the <strong>lazy</strong>  dog</center>

would be rendered as:

The quick brown fox jumped over the lazy dog

Markup languages were the precursors of word processing programs that became popular in the 1980s with the PC revolution. The word processing programs used a What-You-See-Is-What-You-Get (WYSIWYG) interface, which is much easier for a novice computer user to interface with than a markup language. WYSIWYG editors eventually took over the market for word processing, with a few exceptions, such as LaTeX, as mentioned earlier.

SGML was originally a traditional markup language, and hyperlinks between documents were added to create HTML. In the beginning, the purpose was to link physics papers together in a web of documents. HTML started with a browser introduced at CERN in 1990. HTML has expanded far beyond the wildest vision of its creators, but still maintains its markup character.

HTML still consists of text and tags. Over time, HTML has evolved from its roots and is no longer seen simply as a way to format a document. The HTML language is now used to define the content (or contextual meaning) of items on a web page, and the tags have evolved to represent this new role. Most of the original tags specify how to format text, such as bolding (<b>), centering (<center>) or italicizing (<i>), are now considered obsolete and their use is discouraged. Bolding is now done by the content tag strong (<strong>), and italicizing is done by the content tag emphasis (<em>).     Formatting is done based on the content tags using CSS, and interactivity is defined using embedded JavaScript programming using the <script> tag. HTML has become much more than a simple markup language, but to understand HTML, it is important to understand its roots as a markup language.

The tags in HTML are keywords defined between a less-than sign (<) and a greater-than sign (>), though when using HTML, it is more common to call them angle brackets. Between the angle brackets are HTML tags and attributes. For example, the HTML tag to bold text is the word strong, so to bold text the <strong> tag would be used. The tag represents actions to be taken by the program that processes the marked-up text.

HTML tags are not case sensitive, so the tags <i> and <I> are equivalent.

Most tags are applied to a block of text and apply to the block of the text they enclose. All tags are closed using a slash (/tag), as in the example above where <strong>lazy</strong> caused the word lazy to be bolded.

The tags shown above are called block tags in that the first tag (e.g. <i>) specifies where to begin italicizing the text, and the closing tag (e.g. </i>) specifies where to stop italicizing the text. The text or other information between the two tags is called a block.

Sometimes a tag, such as a break tag (<br>) or image tag (<image>) are empty, in the sense that they simply run a command, and do not apply an attribute to the text. In the case of the <br> tag, the meaning is to simply skip a line, so it does not affect any text or any other element. It could be written as <br></br>. However, HTML provides a shortcut for this type of tag. The tag can be closed between the angle brackets that opened it. The <br> tag can be written as <br/>.

As the br tag shows, tags can be used to do many things in HTML other than just markup text. For example, tags can be used to tell the HTML processor to include a picture. If a picture exists in the same directory as the web page, the image can be included on the page by adding the following tag into the HTML for the page:

<image src="dog.jpg" />

Program 1 = Image Tag

This line of code includes the picture from the file dog.jpg in the web page. The tag is the image tag, but the image needs an attribute to indicate where to find the picture. For the image tag, the attribute used to find the picture is the src tag.

Attributes are data that fill in details needed to implement the desired behavior for the tag. All tags can have some attributes, and these will be looked at in more detail later.

Standard HTML Tags

There are four HTML tags that are considered standard for all web pages. These tags are the <html>, <head >, <title>, and <body> tags. A strict HTML5 web page is required to have these four tags, and many IDEs will automatically insert these four tags into a page for you when you start an HTML page. Since these four tags are always recommended for every web page, I personally keep a template, shown below, that I copy when I begin all web pages.

<html>
  <head>
     <title>Please change this to the title of your page </title>
  </head>
 <body>
 </body>
</html>

Program 2 – HTML template

This template code can be represented as a top-down tree, as shown below. In this tree, the html tag is used to contain 2 elements, the head and the body. Likewise, the head section contains the title, and as we will see shortly, the head and the body sections will contain many other HTML elements. Thus, we will call these 3 tags container tags. The title only contains a block of text, so it is a block tag.


Figure 1 - Tree layout of an html document

These four tags (html, head, title, and body) are special in that they define the structure of an HTML document and are called Document Structure tags. This will be covered more fully in the next section. But first, there are some points to be made about how to structure HTML files.

In the file in Program 2, note that each container tag (html, head, and body) is indented to show the hierarchical structure of the document representing the tree in Figure 2.1. This is not required by the HTML processor, as the processor is just looking at strings of instructions and text and ignoring any program format. However, indenting makes it easier for the developer and maintainer of web pages to understand what is going on in the program.

The second thing I always recommend writing HTML code is to end all container tags when the beginning tag is entered. This means when <head> is entered, the </head> is immediately entered. This is the automatic behavior of many IDEs. The reason to enter a close tag when opening a container tag is to enforce boundaries on the ideas and concepts that are being expanded in the container. This does not make sense to many novices, who seem to see ideas as unstructured information that starts at the top of the document and just streams to the end. Novice ideas often appear (to me) to be a jumble of thoughts. They do not see a purpose in creating boundaries or structure to express the idea. This is true in all areas of academia, including unreadable papers and documentation. This is why indenting and container boundaries are so important to enforce a structured way of presenting the ideas. And why a basic course in CS, which teaches this structuring, can be important for students of any major.

But since this concept of structuring ideas is such an enigma to students, I give a practical reason for entering the closing tag when the opening tag is entered. If the closing tag is not immediately entered, it is likely to be completely forgotten and lead to other problems. Though the best reason for students seems to be so they don't lose points on a test.

Document Structure Tags and A Simple Web Page

An HTML document is divided into two main sections, the head and the body. The reason for this division is that the head is to contain metadata, and the body is to contain the information to be displayed on the web page.

To understand this difference, it is important to understand the meaning of the term metadata. According to Dictionary.com, the meta prefix means: "a prefix added to the name of something that consciously references or comments upon its own subject of features". Hence metaphysics is physics about physics, a meta-analysis is a study of studies, etc.

Metadata is what its name implies, data about the data on a web page. It defines how the page is to interpret the data that it will process. For example, functions that are used in a web page are defined in the head. How to handle events and interpret the CSS tags are also defined in the head. Anything that is used to define the behavior of the page is in the head of the document.

As important as what is in the head is what is not in the head of the HTML document. The head should not output any information (or data) to be placed on the web page. Functions and other structures defined in the head should return strings to be printed in the body, and not printed to the page in the head. If the statement is defining something to be rendered on the page itself, it does not belong in the head.

This implies (correctly) that the body of an HTML document should contain anything that is rendered and placed on the web page. Any text to be displayed, images to be rendered, or forms to be processed belong in the body of the document. And again, the body should not contain any metadata such as functions, CSS, or code to handle events.

Nothing in HTML enforces this policy, but there are not many good reasons to violate it. And when the data in the head and body are mixed, it generally shows that the programmer did not have a clear concept of what the page is to do.

In the Document Structure, the <title> tag is shown as metadata. This is because the title is what appears on the tab in the browser and is not rendered on the page.

Program 3 is a simple HTML web page to illustrate the concepts covered so far. Note that the program uses the large heading (<h1>), and paragraph (<p>) block tags, and the <image> tag, which have not been covered. As has been stated before, this text is not to be a text on learning HTML, CSS, JavaScript, or any other language or program. It intends to provide enough detail to allow a motivated intermediate programmer, specifically students doing research with me, enough background to start that research. A complete list of HTML tags can be found at: https://www.w3schools.com/tags/, and many tutorials exist on how to use them in web pages. Readers interested in more functionality of the tags can easily look them up on the WWW. But it is expected that the readers of this text are sufficiently advanced that they can research and learn the implementation details of this type of material.

Enter Program 3 is entered into a file with a ".html" extension. Note that the file must have some form of a .html (e.g. .htm, etc.) extension for the browser to recognize it as an HTML file. Place a jpeg picture (any picture) into a file named dog.jpg, and open the file in a browser such as Chrome, Firefox, Safari, IE, or MS Edge. You should get a page similar to Figure 2-2.

<html>
<!--
  Author: Charles Kann
Date: 5/17/2017
Purpose: A first example of an HTML program
-->
 <head>
  <title>First HTML Web Page </title>
</head>
<body>
  <h1>First page</h1>
 <p>
  This is a first page of text, and shows how to
  insert a picture of a dog
  into a page.
  </p>
  <image src="dog.jpg/>
  <p>
  This page also shows how to handle text using the
 paragraph (&lt;p&gt;)symbol, as well as how to show
  the &lt; and &gt; symbols in html text.
  </p>
</body>
</html>

Program 3 – First HTML Web Page

In this program, comments in HTML begin with a (<!--) tag and continue until a (-->) tag. There is a comment at the start of this document to provide a preamble comment for the file. The need for file preamble comments, and commenting code correctly, is stressed in every introductory programming course I have ever encountered. However, it seems as though students believe such commenting is not useful, and only applies to introductory classes and/or the first language they learned. They throw out these lessons as soon as they think they can safely get away with it. That is why at every level, student programs need to be graded on commenting, and a poorly commented program by a senior should be given an F, even if it works. Commenting is not something to be avoided. It is always good practice, and it will be good practice in web development also.


Figure 2 – Output from the first Web Page

Quick Check

  1. What symbol is used to start an HTML tag? What symbol is used to end an HTML tag?

  2. What four tags should you use in all HTML documents? 

  3. How do you close an HTML block tag? How do you close an empty tag? 

  4. What is a tag? What is an attribute? 

  5. What are document structure tags?

  6. What tags should be present in all web pages? What are they used for? 

  7. Give some examples of tags that have attributes. What are the attributes? 

  8. What happens to text that spans multiple lines in the HTML source file? 

  9. What do you think the &lt; and &gt; symbols do? What other symbols do you think are often specified this way in HTML?