Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 12.1. Checking XML Well-Formedness


Credit: Paul Prescod, Farhad Fouladi


Problem


You need to check
whether an XML document is well formed (not
whether it conforms to a given DTD or schema), and you need to do
this check quickly.


Solution


SAX (presumably using a fast parser such as Expat underneath) offers
a fast, simple way to perform this task. Here is a script to check
well-formedness on every file you mention on the
script's command line:

from xml.sax.handler import ContentHandler
from xml.sax import make_parser
from glob import glob
import sys
def parsefile(filename):
parser = make_parser( )
parser.setContentHandler(ContentHandler( ))
parser.parse(filename)
for arg in sys.argv[1:]:
for filename in glob(arg):
try:
parsefile(filename)
print "%s is well-formed" % filename
except Exception, e:
print "%s is NOT well-formed! %s" % (filename, e)


Discussion


A text is a well-formed XML document if it adheres to all the basic
syntax rules for XML documents. In other words, it has a correct XML
declaration and a single root element, all tags are properly nested,
tag attributes are quoted, and so on.

This recipe uses the SAX API with a dummy
ContentHandler that does nothing. Generally, when
we parse an XML document with SAX, we use a
ContentHandler instance to process the
document's contents. But in this case, we only want
to know whether the document meets the most fundamental syntax
constraints of XML; therefore, we need not do any processing, and the
do-nothing handler suffices.

The parsefile function parses the whole document and
throws an exception if an error is found. The
recipe's main code catches any such exception and
prints it out like this:

$ python wellformed.py test.xml
test.xml is NOT well-formed! test.xml:1002:2: mismatched tag

This means that character 2 on line 1,002 has a mismatched tag.

This recipe
does not check adherence to a DTD or schema, which is a separate
procedure called validation. The performance
of the script should be quite good, precisely because it focuses on
performing a minimal irreducible core task. However, sometimes you
need to squeeze out the last drop of performance because
you're checking the well-formedness of truly huge
files. If you know for sure that you do have Expat, specifically,
installed on your system, you may alternatively choose to use Expat
directly instead of SAX. To try this approach, you can change
function parsefile to the following code:

import xml.parsers.expat
def parsefile(file):
parser = xml.parsers.expat.ParserCreate( )
parser.ParseFile(open(file, "r"))

Don't expect all that much of an improvement in
performance when using Expat directly instead of SAX. However, you
might gain a little bit.


See Also


Recipe 12.2 and Recipe 12.3, for other uses of
SAX; the PyXML package (http://pyxml.sourceforge.net/) includes the
pure-Python validating parser xmlproc, which
checks the conformance of XML documents to specific DTDs; the PyRXP
package from ReportLab is a wrapper around the fast validating parser
RXP (http://www.reportlab.com/xml/pyrxpl),
which is available under the GPL license.


/ 394