Word Hacks [Electronic resources]

Andrew Savikas

نسخه متنی -صفحه : 162/ 134
نمايش فراداده

Hack 97 Remove Direct Formatting with XSLT

Strip out non-style-based formatting from Word documents.

A common "cleanup" task in Word is to remove any formatting from a document that hasn't been applied with a style. It's a bit of a chore within Word, but it turns out to be remarkably concise in XSLT.

10.9.1 The Code

Enter the following code in a standard text editor such as Notepad and save it as removeDirectFormatting.xsl:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<!-- By default, recursively copy everything through -->
<xsl:template match="@*|node( )">
<xsl:copy>
<xsl:apply-templates select="@*|node( )"/>
</xsl:copy>
</xsl:template>
<!-- Remove all direct paragraph formatting -->
<xsl:template match="w:p/w:pPr/*[not(self::w:pStyle)]"/>
<!-- Remove all direct run formatting -->
<xsl:template match="w:r/w:rPr/*[not(self::w:rStyle)]"/>
</xsl:stylesheet>

As in [Hack #96], this hacks uses an XSLT identity transformation. The first template rule copies all nodes that don't trigger the other two template rules. The other two are both empty, which means that nodes that match them will effectively be stripped from the document. In this case, there are two particular contexts in which you want to exclude elements: inside the w:pPr and w:rPr elementsparticularly where they occur as children of w:p and w:r elements, respectively.

Child elements of w:pPr and w:rPr set various formatting properties. There is one special child of each, however: the w:pStyle and w:rStyle elements are used not to apply direct formatting, but rather to associate the current paragraph or run with a paragraph or character style, respectively. Thus, these template rules are careful to avoid stripping out the w:pStyle and w:rStyle elements.

10.9.2 Running the Hack

To run this hack on a document named formatted.xml located in the same folder as the removeDirectFormatting.xsl stylesheet, type the following at a DOS command prompt in the same folder:

>msxsl formatted.xml removeDirectFormatting.xsl -o no-direct-formatting.xml

Evan Lenz