Hack 90 Get a Command-Line XML Processor


Here's a rundown of the tools
you'll need to work with the Word XML shown
throughout this chapter.
When running these hacks, you'll
need a command-line
processor, an XSLT processor that runs from a DOS command
prompt.
You can read about and download Microsoft's own
command-line XSLT processor, msxsl.exe, at
this
URL:
http://
www.microsoft.com/downloads/details.aspx?FamilyId=2FB55371-C94E-4373-B0E9-DB4816552E41&displaylang=en
|
The libxml project (hosted at
http://www.xmlsoft.org) houses
some quite useful command-line utilities for XML processing. Native
Windows binaries for each of the libxml tools are available at
http://www.zlatkovic.com/libxml.enl. One
particularly convenient tool in the libxml suite is the
xmllint command. Its --format
option, which inputs an XML document and outputs a printed version of
it (adding line breaks and indentation), is an excellent tool for
learning
WordprocessingML and for helping to author stylesheets that create
Word documents.
Figure 10-1 shows how a
WordprocessingML
document looks when opened in Notepad after just saving it from Word.
The entire document is jammed onto four extremely long lines of text,
making it a tad difficult to inspect.
Figure 10-1. Word's "raw" XML output

Figure 10-2 shows a portion of the same document,
after using the command xmllint
--format. The indenting and line breaks make for a
much more readable XML file.
Figure 10-2. An easier-to-read version, created with xmllint

The libxml project also contains its own XSLT processor, with a
command-line tool called xsltproc. Other freely
available XSLT processors you may want to try out include Saxon
(http://saxon.sourceforge.net)
and Xalan (http://xml.apache.org/xalan-j/), both of
which are Java-based processors.