Word Hacks [Electronic resources]

Andrew Savikas

نسخه متنی -صفحه : 162/ 127
نمايش فراداده

Hack 90 Get a Command-Line XML Processor

Here's a rundown of the tools you'll need to work with the Word XML shown throughout this chapter.

When running these hacks, you'll need a command-line processor, an XSLT processor that runs from a DOS command prompt.

You can read about and download Microsoft's own command-line XSLT processor, msxsl.exe, at this URL:

http:// www.microsoft.com/downloads/details.aspx?FamilyId=2FB55371-C94E-4373-B0E9-DB4816552E41&displaylang=en

After you download msxsl.exe, move it to the C:\Windows folder so you can run it from a DOS command prompt within any folder on your system.

The libxml project (hosted at http://www.xmlsoft.org) houses some quite useful command-line utilities for XML processing. Native Windows binaries for each of the libxml tools are available at http://www.zlatkovic.com/libxml.enl. One particularly convenient tool in the libxml suite is the xmllint command. Its --format option, which inputs an XML document and outputs a printed version of it (adding line breaks and indentation), is an excellent tool for learning WordprocessingML and for helping to author stylesheets that create Word documents.

Figure 10-1 shows how a WordprocessingML document looks when opened in Notepad after just saving it from Word. The entire document is jammed onto four extremely long lines of text, making it a tad difficult to inspect.

Figure 10-1. Word's "raw" XML output

Figure 10-2 shows a portion of the same document, after using the command xmllint --format. The indenting and line breaks make for a much more readable XML file.

Figure 10-2. An easier-to-read version, created with xmllint

The libxml project also contains its own XSLT processor, with a command-line tool called xsltproc. Other freely available XSLT processors you may want to try out include Saxon (http://saxon.sourceforge.net) and Xalan (http://xml.apache.org/xalan-j/), both of which are Java-based processors.