HTML..XHTML.The.Definitive.Guide..5th.Ed.1002002 [Electronic resources] نسخه متنی

2.6 Text

Text-related HTML/XHTML markup tags comprise the richest set of all
in the standard languages. That's because the
original language HTML emerged as a way to enrich the
structure and organization of text.

HTML came out of academia. What was and still is important to those
early developers was the ability of their mostly academic,
text-oriented documents to be scanned and read without sacrificing
their ability to distribute documents over the Internet to a wide
diversity of computer display platforms. (ASCII text is the only
universal format on the global Internet.) Multimedia integration is
something of an appendage to HTML and XHTML, albeit an important one.

Also, page layout is secondary to structure. We
humans visually scan and decide textual relationships and structure
based on how it looks; machines can only read encoded markings.
Because documents have encoded tags that relate meaning, they lend
themselves very well to computer-automated searches and also to the
recompilation of content features very important to
researchers. It's not so much
how something is said as
what is being said.

Accordingly, neither HTML nor XHTML is a page-layout language. In
fact, given the diversity of user-customizable browsers, as well as
the diversity of computer platforms for retrieval and display of
electronic documents, all these markup languages strive to accomplish
is to advise, not dictate, how the document
might look when rendered by the browser. You cannot force the browser
to display your document in any certain way. You'll
hurt your brain if you insist otherwise.

2.6.1 Appearance of Text

For instance, you cannot predict what
font and what absolute size 8- or 40-point Helvetica, Geneva,
Subway, or whatever will be used for a particular
user's text display. Okay, so the latest browsers
now support standard Cascading Style Sheets and other desktop
publishing-like features that let you control the layout and
appearance of your documents. But users may change their
browser's display characteristics and override your
carefully laid plans at will, quite a few of the older browsers out
there don't support these new layout features, and
some browsers are text-only with no nice fonts at all. What to do?
Concentrate on content. Cool pages are a flash in the pan. Deep
content will bring people back for more and more.

Nonetheless, style does matter for
readability, and it is good to include it where you can, as long as
it doesn't interfere with content presentation. You
can attach common style attributes to your text with
physical style tags, like the italic
<i> tag in our simple example. More
importantly and truer to the language's original
purpose, HTML and XHTML have content-based style
tags that attach meaning to various text
passages. And you can alter text display characteristics, such as
font style, size, color, and so on, with Cascading Style Sheets
(CSS).

Today's graphical browsers recognize the physical
and content-related text style tags and change the appearance of
their related text passages to visually convey meaning or structure.
You can't predict exactly what that change will look
like.

The HTML 4 standard (and even more so, the XHTML 1.0 standard)
stresses that future browsers will not be so visually bound. Text
contents may be heard or even felt, for example, not read by viewers.
Context clues surely are better in those cases than physical styles.

2.6.1.1 Content-based text styles

Content-based
style tags indicate to the browser that a portion of your HTML/XHTML
text has a specific usage or meaning. The
<cite> tag in our simple
example, for instance, means the enclosed text is some sort of
citation the document's author, in this case.
Browsers commonly, although not universally, display the citation
text in italic, not as regular text. [Content-Based Style Tags]

While it may or may not be obvious to the current reader that the
text is a citation, someday someone might create a computer program
that searches a vast collection of documents for embedded
<cite> tags and compiles a special list of
citations from the enclosed text. Similar software agents already
scour the Internet for embedded information to compile listings, such
as the infamous Google database of web sites.

The most common content-based style used today is that of emphasis,
indicated with the <em> tag. And if
you're feeling really emphatic, you might use the
<strong> content style. Other content-based
styles include <code>, for
snippets of programming code; <kbd>, to
denote text entered by the user via a keyboard;
<samp>, to mark sample text;
<dfn>, for definitions; and
<var>, to delimit variable names within
programming code samples. All of these tags have corresponding end
tags.

2.6.1.2 Physical styles

Even the
barest of barebones text processors conform to a few traditional text
styles, such as italic and bold characters. While not word-processing
tools in the traditional sense, HTML and XHTML provide tags that
explicitly tell the browser to display (if it can) a character, word,
or phrase in a particular physical style.

Although you should use related content-based tags, for the reasons
we argued earlier, sometimes form is more important than function.
Use the <i> tag to italicize text
without imposing any specific meaning, the
<b> tag to display text in boldface, or the
<tt> tag so that the browser, if it can,
displays the text in a teletype-style monospaced typeface. [Section 4.5]

It's easy to fall into the trap of using physical
styles when you should really be using a content-based style instead.
Discipline yourself now to use the content-based styles, because, as
we argued earlier, they convey meaning as well as style, thereby
making your documents easier to automate and manage.

2.6.1.3 Special text characters

Not all text characters available
to you for display by a browser can be typed from the keyboard. And
some characters have special meanings, such as the brackets around
tags, which if not somehow differentiated when used for plain
text the less-than sign (<) in a math
equation, for example will confuse the browser and trash your
document. HTML and XHTML give you a way to include any of the many
different characters that comprise the ASCII character set anywhere
in your text through a special encoding of its
character
entity.

Like the copyright symbol in our simple example, a character entity
starts with an ampersand
(&), followed by its name, and terminated with
a semicolon
(;). Alternatively, you may also use the
character's position number in the ASCII table of
characters, preceded by the pound or sharp sign
(#), in lieu of its
name in the character-entity sequence. When rendering the document,
the browser displays the proper character, if it exists in the
user's font. [Section 3.5.2]

For obvious reasons, the most commonly used character entities are
the greater-than (>), less-than
(<), and ampersand
(&) characters. Check Appendix F to find out what symbol the character entity
¦ represents.
You'll be pleasantly surprised!

2.6.2 Text Structures

It's not obvious in our
simple example, but the common carriage returns we use to separate
paragraphs in our source document have no meaning in HTML or XHTML,
except in special circumstances. You could have typed the document
onto a single line in your text editor, and it would still appear the
same in Figure 2-1.[3]

[3] We use a
computer programming-like style of indentation so that our source
HTML/XHTML documents are more readable. It's not
obligatory, nor are there any formal style guidelines for source
HTML/XHTML document text formats. We do, however, highly recommend
that you adopt a consistent style, so that you and others can easily
follow your source documents.

You'd soon discover, too, if you
hadn't read it here first, that except in special
cases, browsers typically ignore leading and trailing spaces, and
sometimes more than a few in between. (If you look closely at the
source example, the line "Greetings
from" looks like it should be indented by leading
spaces, but it isn't in Figure 2-1.)

2.6.2.1 Divisions, paragraphs, and line breaks

A
browser takes the text in the body of your document and
"flows" it onto the computer
screen, disregarding any common carriage-return or line-feed
characters in the source. The browser fills as much of each line of
the display window as possible, beginning flush against the left
margin, before stopping after the rightmost word and moving on to the
next line. Resize the browser window, and the text reflows to fill
the new space, indicating HTML's inherent
flexibility.

Of course, readers would
rebel if your text just ran on and on, so HTML and XHTML provide both
explicit and implicit ways to control the basic structure of your
document. The most rudimentary and common ways are with the division
(<div>), paragraph
(<p>), and line-break
(<br>) tags. All break the text flow, which
consequently restarts on a new line. The differences are that the
<div> and <p> tags
define an elemental region of the document and text, respectively,
the contents of which you may specially align within the browser
window, apply text styles to, and alter with other block-related
features.

Without special alignment attributes, the
<div> and <br> tags
simply break a line of text and place subsequent characters on the
next line. The <p> tag adds more vertical
space after the line break than either the
<div> or <br> tags.
[Section 4.1.1] [Section 4.1.2] [Section 4.6.1]

By the way, the HTML standard includes end tags for the paragraph and
division tags, but not for the line-break tag.[4] Few authors ever include
the paragraph end tag in their documents; the browser usually can
figure out where one paragraph ends and another begins.[5] Give yourself a star if you knew that
</p> even exists.

[4] With
XHTML, <br>'s start and end
are between the same brackets: <br
/>. Browsers tend to be very forgiving and
often ignore extraneous things, such as the forward slash in this
case, so it's perfectly okay to get into the habit
of adding that end-mark.

[5] The paragraph end tag is being used more commonly now that the
popular browsers support the paragraph-alignment attribute.

2.6.2.2 Headings

Besides
breaking your text into divisions and paragraphs, you can also
organize your documents into sections with headings. Just as they do
on this and other pages in this printed book, headings not only
divide and entitle discrete passages of text, they also convey
meaning visually. And headings readily lend themselves to
machine-automated processing of your documents.

There are six heading tags, <h1> through
<h6>, with corresponding end tags.
Typically, the browser displays their contents in, respectively, very
large to very small font sizes, and usually in boldface. The text
inside the <h4> tag typically is the same
size as the regular text. [Section 4.2.1]

The heading tags also break the current text flow, standing alone on
lines and separated from surrounding text, even though there
aren't any explicit paragraph or line-break tags
before or after a heading.

2.6.2.3 Horizontal rules

Besides
headings, HTML and XHTML provide horizontal rule lines that help
delineate and separate the sections of your document.

When the browser encounters an <hr> tag in
your document, it breaks the flow of text and draws a line across the
display window on a new line. The flow of text resumes immediately
below the rule.Section 5.1.1]

[6] Similar to
<br>, with XHTML the formal horizontal rule
tag is <hr />.

2.6.2.4 Preformatted text

Occasionally, you'll want
the browser to display a block of text as-is: for example, with
indented lines and vertically aligned letters or numbers that
don't change even though the browser window might
get resized. The <pre> tag rises to those
occasions. All text up to the closing </pre>
end tag appears in the browser window exactly as you type it,
including carriage returns, line feeds, and leading, trailing, and
intervening spaces. Although very useful for tables and forms,
<pre> text looks pretty dull; the popular
browsers render the block in a monospace typeface. [Section 4.6.5]

HTML..XHTML.The.Definitive.Guide..5th.Ed.1002002 [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی