Hack 95 Batch-Process Word Documents with XSLT

This hack shows you how to use XSLT to compile
a report containing information from several different
WordprocessingML documents.
Thanks to WordprocessingML and the powers of XSLT, it is now
straightforward to perform bulk processing on multiple Word
documents. This particular hack is less about modifying the documents
themselves (that's covered in [Hack #96] )
than about generating a report that aggregates information from
multiple Word documents. In this case, you want to extract and total
all of the Word comments from a variable number of input documents.
The resulting report format is just another Word document.
Say you have five Word documents in WordprocessingML format in a
folder called C:\Word Documents. The files are
named as follows:
word1.xml
word2.xml
word3.xml
word4.xml
word5.xml
Each file contains multiple comments from multiple reviewers, and
you'd like a list of all the comments from all the
files (see Figure 10-10).
Figure 10-10. Aggregating comments from multiple Word documents via XSLT

To get started, enter the following code in a standard text editor
such as Notepad, save it in the same folder as the files with the
comments, and name it file-list.xml:
<input-files>
<file>word1.xml</file>
<file>word2.xml</file>
<file>word3.xml</file>
<file>word4.xml</file>
<file>word5.xml</file>
</input-files>
To change the names of the files to be processed, just add to,
modify, or delete the file elements.
10.7.1 The Code
To create the report, enter this code in a standard text editor such
as Notepad, save it in the same folder as the other files, and name
it bulk-report.xsl:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core">
<xsl:variable name="input-docs" select="document(/input-
/image/library/english/10049_file)"/>
<xsl:variable name="all-comments"
select="$input-docs//aml:annotation[@w:type='Word.Comment']"/>
<xsl:template match="/">
<xsl:processing-instruction name="mso-application">
<xsl:text>prog</xsl:text>
</xsl:processing-instruction>
<w:wordDocument>
<xsl:attribute name="xml:space">preserve</xsl:attribute>
<w:body>
<w:p>
<w:r>
<w:rPr>
<w:sz w:val="32"/>
</w:rPr>
<w:t>Total # of files processed: </w:t>
<w:t>
<xsl:value-of select="count(input-/image/library/english
/10049_file)"/>
</w:t>
<w:br/>
<w:t>Total # of comments: </w:t>
<w:t>
<xsl:value-of select="count($all-comments)"/>
</w:t>
</w:r>
</w:p>
<w:p/>
<xsl:for-each select="input-/image/library/english/10049_file">
<w:p>
<w:r>
<w:rPr>
<w:sz w:val="28"/>
</w:rPr>
<w:t>File: <xsl:value-of select="."/></w:t>
</w:r>
</w:p>
<xsl:apply-templates select="document(.)//aml:annotation
[@w:type='Word.Comment']"/>
</xsl:for-each>
</w:body>
</w:wordDocument>
</xsl:template>
<xsl:template match="aml:annotation">
<w:p>
<w:r>
<w:t>From <xsl:value-of select="@aml:author"/>:</w:t>
</w:r>
</w:p>
<xsl:copy-of select="aml:content/*"/>
<w:p/>
</xsl:template>
</xsl:stylesheet>
First, you create a few lines that look like headers, containing the
total number of files processed and the total number of comments
found:
<w:t>Total # of files processed: </w:t>
<w:t>
<xsl:value-of select="count(input-/image/library/english/10049_file)"/>
</w:t>
<w:br/>
<w:t>Total # of comments: </w:t>
<w:t>
<xsl:value-of select="count($all-comments)"/>
</w:t>
Next, you iterate through each of the file
elements in the source document, outputting a pseudoheading to group
the results by filename:
<xsl:for-each select="input-/image/library/english/10049_file">
<w:p>
<w:r>
<w:rPr>
<w:sz w:val="28"/>
</w:rPr>
<w:t>File: <xsl:value-of select="."/></w:t>
</w:r>
</w:p>
...
</xsl:for-each>
|
With the help of XSLT's document
function, you then grab all the aml:annotation
elements of the type Word.Comment from each input
document:
<xsl:apply-templates select="document(.)//aml:annotation
[@w:type='Word.Comment']"/>
And for each comment, you display who authored the comment, followed
by the text of the comment itself in a subsequent paragraph:
<xsl:template match="aml:annotation">
<w:p>
<w:r>
<w:t>From <xsl:value-of select="@aml:author"/>:</w:t>
</w:r>
</w:p>
xsl:copy-of select="aml:content/*"/>
<w:p/>
</xsl:template>
10.7.2 Running the Hack
To run this hack, enter the following at a DOS command prompt within
the same folder as the files:
> msxsl file-list.xml bulk-report.xsl -o comment-report.xml
If you double-click the newly created file,
comment-report.xml, you'll see
a document like the one shown in Figure 10-10.
Evan Lenz