Word Hacks [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Word Hacks [Electronic resources] - نسخه متنی

Andrew Savikas

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








Hack 96 Standardize Documents with XSLT




Before you print or distribute a document,
you'll often want to put it into a consistent format
without any extraneous items, such as comments left over from
editing. This hack shows you how to use XSLT to scrub a document
clean.


The previous examples
showed you how to generate Word
documents [Hack #94]
and extract information from Word documents [Hack #95] .
This hack shows you how to use XSLT to modify
Word documents. Actually, in reality, XSLT never modifies anything;
it only creates new documents. But if a new document varies only
slightly from the originaland if you overwrite the original
with the new onethen for all practical purposes,
you've effectively modified the document, right?
That is the approach taken here with XSLT.


The XSLT stylesheet in this hack strips out a number of different
pieces of information: Author and Title document properties, custom
document properties, comments, spelling and grammatical errors,
deletions, formatting changes, and insertion marks. It even resets
the document's view and zoom percentage (to Normal
at 100%).



10.8.1 The Code




Enter the following code in a standard text editor such as Notepad
and save it as cleanup.xsl:


<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core">
<!-- By default, recursively copy everything through -->
<xsl:template match="@*|node( )">
<xsl:copy>
<xsl:apply-templates select="@*|node( )"/>
</xsl:copy>
</xsl:template>
<!-- Normalize document's view and zoom percentage (Normal at 100%) -->
<xsl:template match="w:docPr">
<xsl:copy>
<w:view w:val="normal"/>
<w:zoom w:percent="100"/>
<xsl:apply-templates select="*[not(self::w:view or self::w:zoom)]"/>
</xsl:copy>
</xsl:template>
<!-- Remove all but the Author and Title document properties -->
<xsl:template match="o:DocumentProperties">
<xsl:copy>
<xsl:copy-of select="o:Author|o:Title"/>
</xsl:copy>
</xsl:template>
<!-- Remove all custom document properties -->
<xsl:template match="o:CustomDocumentProperties"/>
<!-- Remove all comments and comment references -->
<xsl:template match="aml:annotation[starts-with(@w:type,
'Word.Comment')]"/>
<!-- Remove all spelling and grammatical errors -->
<xsl:template match="w:proofErr"/>
<!-- Remove all deletions -->
<xsl:template match="aml:annotation[@w:type='Word.Deletion']"/>
<!-- Remove all formatting changes -->
<xsl:template match="aml:annotation[@w:type='Word.Formatting']"/>
<!-- Remove all insertion marks -->
<xsl:template match="aml:annotation[@w:type='Word.Insertion']">
<!-- Process content, but do not copy -->
<xsl:apply-templates select="aml:content/*"/>
</xsl:template>
</xsl:stylesheet>


The stylesheet uses a process known as identity
transformation. The very first template rule in the
stylesheet is the most important one:


  <xsl:template match="@*|node( )">
<xsl:copy>
<xsl:apply-templates select="@*|node( )"/>
</xsl:copy>
</xsl:template>


It may seem cryptic, but it is powerful. An identity transformation
recursively copies all nodes through to the output, unchanged. At
least, that is the default behavior. If you
didn't include any other template rules in the
document, the resulting document would be identical to the source
document. However, because they have higher
priority (a technical term in XSLT), the other
template rules override the default copying behavior for certain
nodes in the source document. If such a template rule is empty, the
node that triggers that template rule effectively gets stripped out
from the result. (Technically, it is merely excluded from being
copied to the result, but since everything else gets copied through,
it has the appearance of being stripped.) For example, the following
template rule matches an
o:CustomDocumentProperties element:


  <!-- Remove all custom document properties -->
<xsl:template match="o:CustomDocumentProperties"/>


Rather than copying the element to the result, this template rule
does nothing, thereby effectively stripping the
element from the document (if it was there in the first place).



10.8.2 Running the Hack




To run this hack, create a simple Word document that contains some
comments and spelling or grammatical errors in Web Layout view (see
Figure 10-11). Save the file as
dirty.xml in the same folder as the
cleanup.xsl file. Then type the following at a
DOS command prompt in the same folder:


> msxsl dirty.xml cleanup.xsl -o clean.xml




Figure 10-11. Document with lots of editing cruft (dirty.xml)


After you apply the stylesheet, you'll easily be
able to see the changes in the new file,
clean.xml (shown in Figure 10-12).




Figure 10-12. The same document with all the cruft removed (clean.xml)


All of the tracked changes and comments have been removed, and the
document view has been set to Normal view at 100% zoom. The file
still contains a misspelled word, but it is no longer annotated as
such. Likewise, the squiggly line for the grammar error has been
stripped out.



10.8.3 Hacking the Hack




Using XSLT to modify Word documents is not just a hare-brained idea
we thought up. If you have Office 2003 Professional or the standalone
version of Word 2003, you can invoke this cleanup process right from
within Word when you save your document.


Open dirty.xml in Word, select
FileSave As, and choose XML Document from the
"Save as type" drop-down menu.
Next, check the "Apply transform"
box, click the Transform button, and then select
cleanup.xsl (see Figure 10-13).
Word applies the XSLT transformation, which always creates a new
document, and then immediately overwrites the original file
(dirty.xml, in this case) with the new document.




Figure 10-13. The "Apply transform" option for invoking an XSLT stylesheet on save




If you run a different edition of Word 2003, such as the version
included with Office 2003 Basic, you won't see the
extra checkbox options in the "Save
As" dialog, as shown earlier.



Evan Lenz



/ 162