4.9 Special Character Encoding
For the most part, characters within
documents that are not part of a tag are rendered as is by the
browser. However, some characters have special meaning and are not
directly rendered, while other characters can't be
typed into the source document from a conventional keyboard. Special
characters need either a special name or a numeric character encoding
for inclusion in a document.
4.9.1 Special Characters
As has become obvious in the discussion and examples leading up to
this section, three characters in source documents have very special
meaning: the less-than sign (<), the
greater-than sign (>), and the ampersand
(&). These characters delimit tags and special
character references. They'll confuse a browser if
left dangling alone or with improper tag syntax, so you have to go
out of your way to include their actual, literal characters in your
documents.[6]
[6] The only exception is that these
characters may appear literally within the <listing>
and <xmp> tags, but this is a moot
point, since the tags are obsolete.
Similarly, you have to use a special encoding to include double
quotation mark characters within a quoted string, or when you want to
include a special character that doesn't appear on
your keyboard but is part of the ISO Latin-1 character set
implemented and supported by most browsers.
4.9.2 Inserting Special Characters
To include a special character in your document, enclose either its
standard entity name or a pound sign (#) and its
numeric position in the Latin-1 standard character set[7]
inside a leading ampersand and an ending semicolon, without any
spaces in between. Whew. That's a long explanation
for what is really a simple thing to do, as the following examples
illustrate. The first example shows how to include a greater-than
sign in a snippet of code by using the character's
entity name. The second demonstrates how to include a greater-than
sign in your text by referencing its Latin-1 numeric value:
[7] The popular ASCII character set is a subset of the more
comprehensive Latin-1 character set. Composed by the well-respected
International Organization for Standardization (ISO), the Latin-1 set
is a list of all letters, numbers, punctuation marks, and so on
commonly used by Western language writers, organized by number and
encoded with special names. Appendix F contains the
complete Latin-1 character set and encoding.
if a > b, then t = 0
if a > b, then t = 0
Both examples cause the text to be rendered as:
if a > b, then t = 0
The complete set of character entity values and names is given in
Appendix F. You could write an entire document
using character encodings, but that would be silly.