3.5 Document Content
Nearly everything else you put into
your HTML or XHTML document that isn't a tag is by
definition content, and the majority of that is text. Like tags,
document content is encoded using a specific character set by
default, the ISO-8859-1 Latin character set. This character set is a
superset of conventional ASCII, adding the necessary characters to
support the Western European languages. If your keyboard does not
allow you to directly enter the characters you need, you can use
character entities to insert the desired characters.
3.5.1 Advice Versus Control
Perhaps the hardest rule to remember when marking up an HTML or XHTML
document is that all the tags you insert regarding text display and
formatting are only advice for the browser: they do not explicitly
control how the browser will display the document. In fact, the
browser can choose to ignore all of your tags and do what it pleases
with the document content. What's worse, the user
(of all people!) has control over the text-display characteristics of
his or her own browser.
Get used to this lack of control. The best way to use markup to
control the appearance of your documents is to concentrate on the
content of the document, not on its final appearance. If you find
yourself worrying excessively about spacing, alignment, text breaks,
and character positioning, you'll surely end up with
ulcers. You will have gone beyond the intent of HTML. If you focus on
delivering information to users in an attractive manner, using the
tags to advise the browser as to how best to display that
information, you are using HTML or XHTML effectively, and your
documents will render well on a wide range of browsers.
3.5.2 Character Entities
Besides
common text, HTML and XHTML give you a way to display special text
characters that you might not normally be able to include in your
source document or that have other purposes. A good example is the
less-than or opening bracket symbol (<). In
HTML, it normally signifies the start of a tag, so if you insert it
simply as part of your text, the browser will get confused and
probably misinterpret your document.
For both HTML and XHTML, the ampersand character
(&) instructs the browser to use a special
character, formally known as a character entity.
For example, the command < inserts that
pesky less-than symbol into the rendered text. Similarly,
> inserts the greater-than symbol, and
& inserts an ampersand. There can be no
spaces between the ampersand, the entity name, and the required,
trailing semicolon. (Semicolons aren't special
characters; you don't need to use an ampersand
sequence to display a semicolon normally.) [Section 16.3.7]
You also may replace the entity name after the ampersand with a pound
symbol (#) and a decimal value corresponding to
the entity's position in the character set. Hence,
the sequence < does the same thing as
< and represents the less-than symbol. In
fact, you could substitute all the normal characters within an HTML
document with ampersand special characters, such as
A for a capital
"A" or a
for its lowercase version, but that would be silly. A complete
listing of all characters and their names and numerical equivalents
can be found in Appendix F.
Keep in mind that not all
special characters can be rendered by all browsers. Some browsers
just ignore many of the special characters; with others, the
characters aren't available in the character sets on
a specific platform. Be sure to test your documents on a range of
browsers before electing to use some of the more obscure character
entities.
3.5.3 Comments
Comments
are another type of textual content that appears in the source HTML
document but is not rendered by the user's browser.
Comments fall between the special <!-- and
--> markup elements. Browsers ignore the text
between the comment character sequences. Here are some sample
comments:
<!-- This is a comment -->
<!-- This is a
multiple-line comment
that ends on this line -->
There must be a space after the initial <!--
and preceding the final -->, but otherwise you
can put nearly anything inside the comment. The biggest exception to
this rule is that the HTML standard doesn't let you
nest comments.[3]
[3] Early versions of Netscape did let you
nest comments, but no longer. The practice is tricky, so just say
no.
Internet Explorer also lets you place
comments within a special
<comment>
tag. Everything between the <comment> and
</comment> tags is ignored by Internet
Explorer. All other browsers display the comment to the user.
Obviously, because of this undesirable behavior, we do not recommend
using the <comment> tag. Instead, always use
the <!-- and -->
sequences to delimit comments.
Besides the obvious use of comments for source documentation, many
web servers use comments to take advantage of features specific to
the document server software. These servers scan the document for
specific character sequences within conventional HTML/XHTML comments
and then perform some action based upon the commands embedded in the
comments. The action might be as simple as including text from
another file (known as a server-side include) or
as complex as executing other commands on the server to generate the
document contents dynamically.