17.2 Cleaning Up After Your HTML Editor
Although you can create and edit
HTML/XHTML documents with a text editor, such as vi or Notepad, most
HTML authors use an application that is designed for creating web
pages several are free of charge, many offer a free evaluation
period, and most are available for download over the Web. Be
forewarned, though; in our experience, you will rarely (if ever) be
able to create a web document from one of these editors without
having to inspect, add to, edit, and sometimes even repair the source
HTML that the editor generates. The following sections discuss a few
things that you should know about and watch out for.
17.2.1 Where Did My Document Go?
One of the first things you will notice
is that many of the HTML editors automatically introduce into your
document markup that you did not explicitly select or write. Remember
this very simple HTML document that we started with in Chapter 2?
<html>
<head>
<title>My first HTML document</title>
</head>
<body>
<h2>My first HTML document</h2>
Hello, <i>World Wide Web!</i>
<!-- No "Hello, World" for us -->
<p>
Greetings from<br>
<a href="http://www.ora.com">O'Reilly & Associates</a>
<p>
Composed with care by:
<cite>(insert your name here)</cite>
<br>©2000 and beyond
</body>
</html>
Here it is what the source looks like after you load it into
Microsoft Word 2000:
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 9">
<meta name=Originator content="Microsoft Word 9">
<link rel=File-List href=">
<title>My first HTML document</title>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>William Kennedy</o:Author>
<o:LastAuthor>William Kennedy</o:LastAuthor>
<o:Revision>2</o:Revision>
<o:TotalTime>7</o:TotalTime>
<o:Created>2002-06-19T18:58:00Z</o:Created>
<o:LastSaved>2002-06-19T18:58:00Z</o:LastSaved>
<o:Pages>1</o:Pages>
<o:Words>26</o:Words>
<o:Characters>152</o:Characters>
<o:Company>ActivMedia Robotics</o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:CharactersWithSpaces>186</o:CharactersWithSpaces>
<o:Version>9.3821</o:Version>
</o:DocumentProperties>
</xml><![endif]-->
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";}
p
{font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=blue style='tab-interval:.5in'>
<div class=Section1>
<h2>My first HTML document</h2>
<p class=MsoNormal>Hello, <i>World Wide Web</i> </p>
<!-- No "Hello, World" for us -->
<p>Greetings from<br>
<a href="http://www.ora.com">O'Reilly & Associates</a> </p>
<p>Composed with care by: <cite>(insert your name here)</cite> <br>
©2002 and beyond </p>
</div>
</body>
</html>
Yeow! Where did the document go? Excessive markup makes the source
document almost humanly impossible to read. What infuriates document
purists like us, beyond the fact that lots of stuff that we neither
wanted nor asked for was added, is that Word 2000 automatically
treats any text document containing HTML markup as fodder for its
mill. You can remove the l or
suffix from the filename or delete
<html> and <head>
from the document, to no avail Word will still get you.
Microsoft isn't alone in cluttering the source. Most
HTML editors add at least a <meta> tag that
contains their product information. Many go through and
"fix" your document to comply with
current standards and practices, too for example, by adding all
those paragraph and list-item end tags that HTML allows you to omit.
(From an XHTML standpoint, we admit that this meddling is probably
valid.)
To its credit, Word runs well, unlike other tools that routinely
crashed without warning as we fought with their treatment of the
markup. Microsoft even offers a Word plug-in that removes the
additional markup, so that you can recover a reasonable facsimile of
the original document.[2]
[2] You can find this plug-in at
http://office.microsoft.com/downloads/2000/Msohtmf2.aspx.
17.2.2 When and Why to Edit the Editor
No matter how good the HTML editor is, you'll
inevitably have to edit the (albeit cluttered) source it generates.
We've had to do it a lot ourselves, and so have all
the web developers we've talked with over the last
few years.
Not all HTML editors provide an easy means to add JavaScript to your
documents, and many are not up-to-date with the HTML/XHTML and CSS2
standards. Remember, too, that the popular browsers
don't always agree on how they render a tag, and
even different versions of the same browser may differ. Furthermore,
even the best HTML editors don't necessarily support
extensions to the language.
So into the source you'll have to go, whether to
include some HTML feature not yet supported by the editor (such as a
new CSS2 property), to insert an attribute value or keyword, or to
modify ones that the editor added.
The tip is this: compose first. Try to start with a clean, finished
document. Concentrate on content from the outset, and add the special
effects later. Use a good HTML editor from the start, or prepare your
documents in two steps with two different tools a good content
editor followed by a good HTML editor particularly if you plan
to distribute the document in a format other than HTML.
17.2.3 Use the Best
If you
compose web pages, we can't imagine you not using an
HTML editor of some sort. The convenience is just too compelling. But
choose carefully: some HTML editors are abysmal, and
you'll spend more time hunting down misplaced tags
and errant attributes than you'll spend actually
creating the document. Top tip: you get what you pay for.
It's no surprise that HTML editors vary greatly in
their features. Many editors let you switch the display from source
text to what may appear when rendered by a browser. Some simply let
you add tags and modify attribute values through pull-down menus and
hot-key options. Others are WYSIWYG layout tools that make it easy to
include graphics and other multimedia content. Other advanced
features include embedding and testing applets and scripts.
In general, HTML editors fall into one of two categories: either they
are good layout tools, including advanced styling features and tools
for dynamic content, or they excel at content creation and
management. Obviously, if you are producing flashy, commercial web
pages that rely on advanced layout techniques and include lots of
different styles and dynamic content, use a good layout tool. If you
are producing a content-rich document, use a tool that provides good
editorial assistance.
No matter which type you use, there are some common considerations to
keep in mind when selecting an HTML editor:
Whether it is up-to-date
No HTML editor is yet entirely up-to-date with the current standards,
particularly CSS2. Read the product specifications and update often.
Whether it includes a source editor
Although you may load an HTML editor-generated document into a
different text editor to change the source, it's
much more convenient if the editor itself lets you view and edit the
HTML source. Also, make sure that your HTML editor
doesn't automatically
"fix" your source edits.
Whether it is modifiable
Ideally, the HTML editor should let you customize its behavior to fit
your specifications. For example, at minimum you should be allowed to
choose your own font colors, styles, and backgrounds, if those are
automatically included in the editor's boilerplate
document.
Cost and reliability
We can't stress enough that you get what you pay
for. If creating web pages is more than just a passing fancy, get the
best editor you can find. Don't use or even trust an
HTML composition tool just because it came with the browser. Find one
that is well supported and well reviewed by other HTML authors. Ask
around, and perhaps join an HTML author's newsgroup
to get the latest scoop on products.