HTML..XHTML.The.Definitive.Guide..5th.Ed.1002002 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

HTML..XHTML.The.Definitive.Guide..5th.Ed.1002002 [Electronic resources] - نسخه متنی

Chuck Musciano, Bill Kennedy

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








15.3 Understanding XML DTDs


To use a markup language defined with XML, you should be able to read
and understand the elements and entities found in its XML DTD. But
don't be put off: while XML DTDs are verbose, filled
with obscure punctuation, and designed primarily for computer
consumption, they are actually easy to understand once you get past
all the syntactic sugar. Remember, your brain is better at languages
than any computer is.

As we said previously, an XML DTD
is a collection of XML entity and element declarations and comments.
Entities are name/value pairs that make the DTD easier to read and
understand, while elements are the actual markup tags defined by the
DTD, like HTML's <p> or
<h1> tags. The DTD also describes the
content and grammar for each tag in the language. Along with the
element declarations, you'll also find
attribute declarations that define the
attributes authors may use with the tags defined by the element
declarations.

There is no required order, although the careful DTD author arranges
declarations in such a way that humans can easily find and understand
them, computers notwithstanding. The beloved DTD author includes lots
of comments, too,
that explain the declarations and how they can be used to create a
document. Throughout this chapter, we use examples taken from the
XHTML 1.0 DTD, which can be found in its entirety at the W3C web
site. Although lengthy, you'll find this DTD to be
well-written, complete, and, with a little practice, easy to
understand.

XML also provides for conditional sections within a DTD, allowing
groups of declarations to be optionally included or excluded by the
DTD parser. This is useful when a DTD actually defines several
versions of a markup language; the desired version can be derived by
including or excluding appropriate sections. The XHTML 1.0 DTD, for
example, defines both the "regular"
version of HTML and a version that supports frames. By allowing the
parser to include only the appropriate sections of the DTD, the rules
for the <html> tag can change to support
either a <body> tag or a
<frameset> tag, as needed.


15.3.1 Comments


The syntax for comments within an XML DTD is exactly like
that for HTML comments: comments begin with
<!-- and end with -->.
Everything between these two elements is ignored by the XML
processor. Comments may not be nested.


15.3.2 Entities


An entity is a
fancy term for a constant. Entities are crucial to creating modular,
easily understood DTDs. Although they may differ in many ways, all
entities associate a name with a string of characters. When you use
the entity name elsewhere within a DTD, or in an XML document,
language parsers replace the name with the corresponding characters.
Drawing an example from HTML, the &lt; entity
is replaced by the < character wherever it
appears in an HTML document.

Entities come in two flavors:
parsed and
unparsed. Parsed entities are processed by an
XML processor; unparsed ones are ignored. The vast majority of
entities are parsed. An unparsed entity is reserved for use within
attribute lists of certain tags; it is nothing more than a
replacement string used as a value for a tag attribute.

You can further divide the group of parsed entities into
general entities and parameter
entities. General entities are used in the XML document, while
parameter entities are used in the XML DTD.

You may not realize that you've been using general
entities within your HTML documents all along. For example, the
entity for the copyright (©) symbol
(&copy;) is a general entity defined in the
HTML DTD. Like all general entities, it is referenced by preceding
its name with the ampersand character. All of the other general
entities you know and love are listed in Appendix F.

To make life easier, XML predefines the five most common general
entities, which can be used in any XML document. While it is still
preferred that they be explicitly defined in any DTD that uses them,
these five entities are always available to any XML author:

&amp;			&
&apos; '
&gt; >
&lt; <
&quot; "

You'll find parameter entities littered throughout
any well-written DTD, including the HTML DTD. Parameter entities have
a percent sign (%) preceding their names. The
percent sign tells the XML processor to look up the entity name in
the DTD's list of parameter entities, insert the
value of the entity into the DTD in place of the entity reference,
and process the value of the entity as part of the DTD.

That last bit is important. By processing the contents of the
parameter entity as part of the DTD, the XML processor allows you to
place any valid XML content in a parameter entity. Many parameter
entities contain lengthy XML definitions and may even contain other
entity definitions. Parameter entities are the workhorses of the XML
DTD; creating DTDs without them would be extremely
difficult.[5]

[5] C and C++ programmers may recognize that
the entity mechanism in XML is similar to the
#define macro mechanism in C and C++. The XML
entities provide only simple character-string substitution and do not
employ C's more elaborate macro parameter
mechanism.



15.3.3 Entity Declarations


Let's define an entity with the
<!ENTITY> tag in an XML DTD. Inside the tag,
first supply the entity name and value, and then indicate whether it
is a general or parameter entity:

<!ENTITY name value>
<!ENTITY % name value>

The first version creates a general entity; the second, because of
the percent sign, creates a parameter entity.

For both entity types, the name is simply a sequence of characters
beginning with a letter, colon, or underscore and followed by any
combination of letters, numbers, periods, hyphens, underscores, or
colons. The only restriction is that names may not begin with the
sequence "xml" (either upper- or
lowercase).

The entity value is either a character string within quotes (unlike
HTML markup, you must use quotes even if it is a string of contiguous
letters) or a reference to another document containing the value of
the entity. For these external entity values, you'll
find either the keyword SYSTEM, followed by the
URL of the document containing the entity value, or the keyword
PUBLIC, followed by the formal name of the
document and its URL.

A few examples will make this clear. Here is a simple general entity
declaration:

<!ENTITY fruit "kumquat or other similar citrus fruit">

In this declaration, the entity "&fruit;"
within the document is replaced with the phrase
"kumquat or other similar citrus
fruit" wherever it appears.

Similarly, here is a parameter entity declaration:

<!ENTITY % ContentType "CDATA">

Anywhere the reference %ContentType; appears in
your DTD, it is replaced with the word
"CDATA". This is the
typical way to use parameter entities: to create a more descriptive
term for a generic parameter that will be used many times in a DTD.

Here is an external general entity declaration:

<!ENTITY boilerplate SYSTEM "http://server.com/boilerplate.txt">

It tells the XML processor to retrieve the contents of the file
boilerplate.txt from
server.com and use it as the value of the
boilerplate entity. Anywhere you use
&boilerplate; in your document, the contents
of the file are inserted as part of your document content.

Here is an external parameter entity declaration, lifted from the
HTML DTD, that references a public external document:

<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" 
"xhtml-lat1.ent">

It defines an entity named HTMLlat1 whose contents
are to be taken from the public document identified as
-//W3C//ENTITIES Latin 1 for XHTML//EN. If the
processor does not have a copy of this document available, it can use
the URL xhtml-lat1.ent to find it. This
particular public document is actually quite lengthy, containing all
of the general entity declarations for the Latin 1 character
encodings for HTML.[6] Accordingly, simply writing this in the HTML DTD:

[6] You can enjoy this document for
yourself at http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent.


%HTMLlat1;

causes all of those general entities to be defined as part of the
language.

A DTD author can use the PUBLIC and
SYSTEM external values with general and parameter
entity declarations. You should structure your external definitions
to make your DTDs and documents easy to read and understand.

You'll recall that we began the section on entities
with a mention of unparsed entities whose only purpose is to be used
as values to certain attributes. You declare an unparsed entity by
appending the keyword NDATA to an external general
entity declaration, followed by the name of the unparsed entity. If
we wanted to convert our general boilerplate entity to an unparsed
general entity for use as an attribute value, we could say:

<!ENTITY boilerplate SYSTEM "http://server.com/boilerplate.txt" NDATA text>

With this declaration, attributes defined as type
ENTITY (as described in Section 15.5.1) could use boilerplate as
one of their values.


15.3.4 Elements


Elements are definitions of the tags that can be used
in documents based on your XML markup language. In some ways, element
declarations are easier than entity declarations, since all you need
to do is specify the name of the tag and what sort of content that
tag may contain:

<!ELEMENT name contents>

The name follows the same rules as names for
entity definitions. The contents section may be
one of four types described here:


The keyword EMPTY defines a tag with no content,
like <hr> or <br>
in HTML. Empty elements in XML get a bit of special handling, as
described in Section 15.4.5.


The keyword ANY indicates that the tag can have
any content, without restriction or further processing by the XML
processor.


The content may be a set of grammar rules that defines the order and
nesting of tags within the defined element. This content type is used
when the tag being defined contains only other tags, without
conventional content allowed directly within the tag. In HTML, the
<ul> tag is such a tag, as it can contain
only <li> tags.


Mixed content, denoted by a comma-separated list of element names and
the keyword #PCDATA, is enclosed in parentheses. This content
type allows tags to have user-defined content, along with other
markup elements. The <li> tag, for example,
may contain user-defined content as well as other tags.



The last two content types form the meat of most DTD element
declarations. This is where the fun begins.


/ 189