3.6 HTML/XHTML Document Elements
Every HTML document should conform to the
HTML SGML DTD, the formal Document Type Definition that defines the
HTML standard. The DTD defines the tags and syntax that are used to
create an HTML document. You can inform the browser which DTD your
document complies with by placing a special SGML (Standard Generalized Markup
Language) command in the first line of the document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
This cryptic message indicates that your document is intended to be
compliant with the HTML 4.01 final DTD defined by the World Wide Web
Consortium (W3C). Other versions of the DTD define more restricted
versions of the HTML standard, and not all browsers support all
versions of the HTML DTD. In fact, specifying any other doctype may
cause the browser to misinterpret your document when displaying it
for the user. It's also unclear what doctype to use
when including in the HTML document the various tags that are not
standards but are very popular features of a popular
browser the Netscape extensions, for instance, or even the
deprecated HTML 3.0 standard, for which a DTD was never released.
Almost no one precedes their HTML documents with the SGML doctype
command. Because of the confusion of versions and standards, we
don't recommend that you include the prefix with
your HTML documents either.
On the other hand, we do strongly recommend that you include the
proper doctype statement in your XHTML documents, in conformance with
XML standards. Read Chapter 15 and Chapter 16 for more about DTDs and the XML and XHTML
standards.
3.6.1 The <html> Tag
As we saw earlier, the
<html> and </html>
tags serve to delimit the beginning and end of a document. Since the
typical browser can easily infer from the enclosed source that it is
an HTML or XHTML document, you don't really need to
include the tag in your source HTML document.
<html>Function Delimits a complete HTML or XHTML document Attributes dirlangversion End tag </html>; may be omitted in HTML Contains head_tag, body_tag, frames |
tag so that other tools, particularly more mundane text-processing
ones, can recognize your document as an HTML document. At the very
least, the presence of the beginning and ending
<html> tags ensures that the beginning or
the end of the document has not inadvertently been deleted. Besides,
XHTML requires the <html> tag.
Inside the <html> tag and its end tag are
the document's head and body. Within the head,
you'll find tags that identify the document and
define its place within a document collection. Within the body is the
actual document content, defined by tags that determine the layout
and appearance of the document text. As you might expect, the
document head is contained within a
<head> tag and
the body is within a <body> tag, both of
which are defined later.
The <body> tag may be replaced by a
<frameset>
tag defining one or more display frames that, in turn, contain actual
document content. See Chapter 11 for more
information. By far, the most common form of the
<html> tag is simply:
<html>
document head and body content
</html>
When the <html> tag appears without the
version attribute, the document server and browser
assume the version of HTML used in this document is supplied to the
browser by the server.
3.6.1.1 The dir attribute
The dir
attribute specifies in which direction the browser should render text
within the containing element. When used within the
<html> tag, it determines how text will be
presented within the entire document. When used within another tag,
it controls the text's direction for just the
content of that tag.
By default, the value of this tag is ltr,
indicating that text is presented to the user left to right. Use the
other value, rtl, to display text right to left,
for languages like Chinese or Hebrew.Of course, the results depend on your content and the
browser's support of HTML 4 or XHTML.Netscape and
Internet Explorer Versions 4 and
earlier ignore the dir attribute. The HTML
4-compliant Internet Explorer Version 5 simply right-justifies
dir=rtl text, although if you look in Figure 3-1, you'll notice the browser
moves the punctuation (the period) to the other side of the sentence.
Internet Explorer 6 does the same thing. Netscape 6 right-justifies
everything, including the ending period.
<html dir=rtl>
<head>
<title>Display Directions</title>
</head>
<body>
This is how IE 5 renders right-to-left directed text.
</body>
</html>
Figure 3-1. Internet Explorer 5 implements the dir attribute

3.6.1.2 The lang attribute
When
included within the <html> tag, the
lang attribute specifies the language
you've generally used within the document. When used
within other tags, the lang attribute specifies
the language you used within that tag's content.
Ideally, the browser will use lang to better
render the text for the user.
Set the value of the lang attribute to an ISO-639
standard two-character language code. You may also indicate a dialect
by following the ISO language code with a dash and a subcode name.
For example, "en" is the ISO
language code for English; "en-US"
is the complete code for U.S. English. Other common language codes
include "fr" (French),
"de" (German),
"it" (Italian),
"nl" (Dutch),
"el" (Greek),
"es" (Spanish),
"pt" (Portuguese),
"ar" (Arabic),
"he" (Hebrew),
"ru" (Russian),
"zh" (Chinese),
"ja" ( Japanese), and
"hi" (Hindi).
3.6.1.3 The version attribute
The version attribute defines the HTML
standard version used to compose the document. Its value, for HTML
Version 4.01, should read exactly:
version="-//W3C//DTD HTML 4.01//EN"
In general, version information within the
<html> tag is more trouble than it is worth,
and this attribute has been deprecated in HTML 4. Serious authors
should instead use an SGML <!doctype> tag at
the beginning of their documents, like this:
<!DOCTYPE HTML PUBLIC "-//W3C/DTD HTML 4.01//EN"
"http://www.w3c.org/TR/html4/strict.dtd">