16.2 Creating XHTML Documents
For the
most part,
creating an XHTML
document is no different from creating an HTML document. Using your
favorite text editor, simply add the markup elements to your
document's contents in the right order, and display
it using your favorite browser. To be strictly correct
("valid," as they say at the W3C),
your XHTML document needs a boilerplate declaration up front that
specifies the DTD you used to create the document and defines a
namespace for the document.
16.2.1 Declaring Document Types
For an XHTML browser to correctly parse and display your XHTML document, you
should tell it which version of XML is being used to create the
document. You must also state which XHTML DTD defines the elements in
your document.
The XML version declaration uses a special
XML processing directive. In
general, these XML directives begin with <? and
end with ?>, but otherwise they look like
typical tags in your document.[3]
To declare that you are using XML Version 1.0, place this directive
in the first line in your document:
[3] <! was already taken.
<?xml version="1.0" encoding="UTF-8"?>
This tells the browser that you are using XML 1.0 along with the
8-bit Unicode character set, the one most commonly used today. The
encoding attribute's value should
reflect your local character set. Refer to the appropriate ISO
standards for other encoding names.
Once you've gotten the important issue of the XML
version squared away, you should then declare the markup
language's DTD:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
With this statement, you declare that your
document's root element is html,
as defined in the DTD whose public identifier is defined as
"-//W3C//DTD XHTML
1.0 Strict//EN". The browser
may know how to find the DTD matching this public identifier. If it
does not, it can use the URL following the public identifier as an
alternative location for the DTD.
As you may have noticed, the above
<!DOCTYPE> directive told the browser to use
the strict XHTML DTD. Here's the one
you'll probably use for your transitional XHTML
documents:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
And, as you might expect, the <!DOCTYPE>
directive for the frame-based XHTML DTD is:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
16.2.2 Understanding Namespaces
As described in the last chapter, an XML
DTD defines any number of element and attribute names as part of the
markup language. These elements and attributes are stored in a
namespace that is unique to the DTD. As you
reference elements and attributes in your document, the browser looks
them up in the namespace to find out how they should be used.
For instance, the <a> tag's
name ("a") and attributes (e.g.,
"href " and
"style") are defined in the XHTML
DTD, and their names are placed in the DTD's
namespace. Any "processing
agent" usually a browser, but your eyes and
brain can serve the same function can look up the name in the
appropriate DTD to figure out what the markup means and what it
should do.
With XML, your document actually can use more than one DTD and
therefore require more than one namespace. For example, you might
create a transitional XHTML document but also include special markup
for some math expressions according to an XML math language. What
happens when both the XHTML DTD and the math DTD use the same name to
define different elements, such as <a> for
XHTML hypertext and <a> for an absolute
value in math? How does the browser choose which namespace to use?
The answer is the xmlns[4] attribute. Use it to define one or more
alternative namespaces within your document. It can be placed within
the start tag of any element within your document, and its
URL-like[5] value defines the namespace that the
browser should use for all content within that element.
[4] XML namespace xmlns get it? This
is why XML doesn't let you begin any element or
attribute with the three-letter prefix
"xml": it's
reserved for special XML attributes and elements.
[5] It looks like a URL, and you might think
that it references a document that contains the namespace, but alas,
it doesn't. It is simply a unique name that
identifies the namespace. Display agents use that placeholder to
refer to their own resources for how to treat the named element or
attribute.
With XHTML, according to XML conventions, you should at the very
least include within your document's
<html> tag an xmlns
attribute that identifies the primary namespace used throughout the
document:
<html xmlns="http://www.w3.org/TR/xhtml1">
If and when you need to include math markup, use the
xmlns attribute again to define the math
namespace. So, for instance, you could use the
xmlns attribute within some math-specific tag of
your otherwise common XHTML document (assuming the
MATH element exists, of course):
<div xmlns="http://www.w3.org/1998/Math/MathML>x2/x</div">
In this case, the XML-compliant browser would use the http://www.w3.org/1998/Math/MathML namespace
to divine that this is the MATH, not the XHTML, version of the
<div> tag, and should therefore be displayed
as a division equation.
It would quickly become tedious if you had to embed the
xmlns attribute into each and every
<div> tag any time you wanted to show a
division equation in your document. A better way particularly
if you plan to apply it to many different elements in your
document is to identify and label the namespace at the
beginning of your document, and then refer to it by that label as a
prefix to the affected element in your document. For example:
<html xmlns="http://www.w3.org/TR/xhtml1"
xmlns:math="http://www.w3.org/1998/Math/MathML">
The math namespace can now be abbreviated to
"math" later in your document. So
the streamlined:
<math:div>x2/x</div>
now has the same effect as the lengthy earlier example of the math
<div> tag containing its own
xmlns attribute.
The vast majority of XHTML authors will never need to define multiple
namespaces and so will never have to use fully qualified names
containing the namespace prefix. Even so, you should understand that
multiple namespaces exist and that you will need to manage them if
you choose to embed content based on one DTD within content
defined by another DTD.
16.2.3 A Minimal XHTML Document
As a courtesy to all fledgling XHTML authors, we now present the
minimal and correct XHTML document, including all the
appropriate XML, XHTML, and namespace declarations. With this most
difficult part out of the way, you need only supply content to create
a complete XHTML document.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/TR/xhtml1" xml:lang="en" lang="en">
<head>
<title>Every document must have a title</title>
</head>
<body>
...your content goes here...
</body>
</html>
Working through the minimal document one element at a time, we begin
by declaring that we are basing the document on the XML 1.0 standard
and using 8-bit Unicode characters to express its contents and
markup. We then announce, in the familiar HTML-like
<!DOCTYPE> statement, that we are following
the markup rules defined in the transitional XHTML 1.0 DTD, which
allow us free rein to use nearly any HTML 4.01 element in our
document.
Our document content actually begins with the
<html> tag, which has its
xmlns attribute declare that the XHTML namespace
is the default namespace for the entire document. Also note the
lang attribute, in both the XML and XHTML
namespaces, which declares that the document language is English.
Finally, we include the familiar document
<head> and <body>
tags, along with the required
<title> tag.