Hack 96 Standardize Documents with XSLT![]() ![]() Before you print or distribute a document, you'll often want to put it into a consistent format without any extraneous items, such as comments left over from editing. This hack shows you how to use XSLT to scrub a document clean. The previous examples showed you how to generate Word documents [Hack #94] and extract information from Word documents [Hack #95] . This hack shows you how to use XSLT to modify Word documents. Actually, in reality, XSLT never modifies anything; it only creates new documents. But if a new document varies only slightly from the originaland if you overwrite the original with the new onethen for all practical purposes, you've effectively modified the document, right? That is the approach taken here with XSLT. The XSLT stylesheet in this hack strips out a number of different pieces of information: Author and Title document properties, custom document properties, comments, spelling and grammatical errors, deletions, formatting changes, and insertion marks. It even resets the document's view and zoom percentage (to Normal at 100%). 10.8.1 The CodeEnter the following code in a standard text editor such as Notepad and save it as cleanup.xsl: <xsl:stylesheet version="1.0" The stylesheet uses a process known as identity transformation. The very first template rule in the stylesheet is the most important one: <xsl:template match="@*|node( )"> It may seem cryptic, but it is powerful. An identity transformation recursively copies all nodes through to the output, unchanged. At least, that is the default behavior. If you didn't include any other template rules in the document, the resulting document would be identical to the source document. However, because they have higher priority (a technical term in XSLT), the other template rules override the default copying behavior for certain nodes in the source document. If such a template rule is empty, the node that triggers that template rule effectively gets stripped out from the result. (Technically, it is merely excluded from being copied to the result, but since everything else gets copied through, it has the appearance of being stripped.) For example, the following template rule matches an o:CustomDocumentProperties element: <!-- Remove all custom document properties --> Rather than copying the element to the result, this template rule does nothing, thereby effectively stripping the element from the document (if it was there in the first place). 10.8.2 Running the HackTo run this hack, create a simple Word document that contains some comments and spelling or grammatical errors in Web Layout view (see Figure 10-11). Save the file as dirty.xml in the same folder as the cleanup.xsl file. Then type the following at a DOS command prompt in the same folder: > msxsl dirty.xml cleanup.xsl -o clean.xml Figure 10-11. Document with lots of editing cruft (dirty.xml)![]() After you apply the stylesheet, you'll easily be able to see the changes in the new file, clean.xml (shown in Figure 10-12). Figure 10-12. The same document with all the cruft removed (clean.xml)![]() All of the tracked changes and comments have been removed, and the document view has been set to Normal view at 100% zoom. The file still contains a misspelled word, but it is no longer annotated as such. Likewise, the squiggly line for the grammar error has been stripped out. 10.8.3 Hacking the HackUsing XSLT to modify Word documents is not just a hare-brained idea we thought up. If you have Office 2003 Professional or the standalone version of Word 2003, you can invoke this cleanup process right from within Word when you save your document. Open dirty.xml in Word, select File "Save as type" drop-down menu. Next, check the "Apply transform" box, click the Transform button, and then select cleanup.xsl (see Figure 10-13). Word applies the XSLT transformation, which always creates a new document, and then immediately overwrites the original file (dirty.xml, in this case) with the new document. Figure 10-13. The "Apply transform" option for invoking an XSLT stylesheet on save![]()
Evan Lenz |