15.2 Documents and DTDs
To be perfectly correct, we must explain that
"XML" has come to mean many subtly
different things. An "XML document"
is a document containing content that conforms to a markup language
defined from the XML standard. An "XML Document Type
Definition" (XML DTD) is a set of rules more
formally known as "entity
and element declarations" that define an XML
markup language; i.e., how the tags are arranged in a correct
("valid") XML document. To make
things even more confusing, entity and element declarations may
appear in an XML document itself, as well as within an XML DTD.
An XML document contains character data, which consists of plain
content and markup in the form of tags and XML declarations. Thus:
<blah>harrumph</blah>
is a line in a
well-formed XML
document. Well-formed XML documents follow certain rules, such as the
requirement for every tag to have a closing tag. These rules are
presented in the context of XHTML in Chapter 16.
To be considered valid -- a
valid
XML document conforms to a DTD every XML document must have a
corresponding set of XML declarations that define how the tags and
content should be arranged within it. These declarations may be
included directly in the XML document, or they may be stored
separately in an XML DTD. If an XML DTD exists that defines the
<blah> tag, our well-formed XML document is
valid, provided you preface it with a
<!DOCTYPE> tag that explains where to find
the appropriate DTD:
<?xml version="1.0"?>
<!DOCTYPE blah SYSTEM "blah.dtd">
<blah>harrumph</blah>
The example document begins with the optional
<?xml> directive declaring the version of
XML it uses. It then uses the <!DOCTYPE>
directive to identify the DTD to be used to process the content of
the document. In this case, a DTD named blah.dtd
should be accessible to the browser[4] so the browser can determine whether
the <blah> tag is valid within the document.
[4] We use
"browser" here because
that's what most people will use to process and view
XML documents. The XML specification uses the more generic phrase
"processing application," since in
some cases the XML document will be processed not by a traditional
browser but by some other tool that knows how to interpret XML
documents.
XML DTDs contain only XML entity and element declarations. XML
documents, on the other hand, may contain both XML element
declarations and conventional content that uses those elements to
create a document. This intermingling of content and declarations is
perfectly acceptable to a computer processing an XML document, but it
can get confusing for humans trying to learn about XML. For this
reason, we focus our attention in this chapter on the XML entity and
element declaration features that you can use to define new tags and
document types. In other words, we are addressing only the DTD
features of XML; the content features mirror the rules and
requirements you already know and use in order to create HTML
documents.