Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 12.2. Counting Tags in a Document


Credit: Paul Prescod


Problem


You want to get a sense of how often
particular elements occur in an XML document, and the relevant counts
must be extracted rapidly.


Solution


You can subclass SAX's
ContentHandler to make your own specialized
classes for any kind of task, including the collection of such
statistics:

from xml.sax.handler import ContentHandler
import xml.sax
class countHandler(ContentHandler):
def _ _init_ _(self):
self.tags={ }
def startElement(self, name, attr):
self.tags[name] = 1 + self.tags.get(name, 0)
parser = xml.sax.make_parser( )
handler = countHandler( )
parser.setContentHandler(handler)
parser.parse("test.xml")
tags = handler.tags.keys( )
tags.sort( )
for tag in tags:
print tag, handler.tags[tag]


Discussion


When I start working with a new XML content set, I like to get a
sense of which elements are in it and how often they occur. For this
purpose, I use several small variants of this recipe. I could also
collect attributes just as easily, as you can see, since attributes
are also passed to the startElement method that
I'm overriding. If you add a stack, you can also
keep track of which elements occur within other elements (for this,
of course, you also have to override the
endElement method so you can pop the stack).

This recipe also works well as a simple example of a SAX application,
usable as the basis for any SAX application. Alternatives to SAX
include pulldom and minidom.
For any simple processing (including this example), these
alternatives would be overkill, particularly if the document you are
processing is very large. DOM approaches are generally justified only
when you need to perform complicated editing and alteration on an XML
document, when the document itself is made complicated by references
that go back and forth inside it, or when you need to correlate
(i.e., compare) multiple documents.

ContentHandler subclasses offer many other
options, and the online Python documentation does a pretty good job
of explaining them. This recipe's
countHandler class overrides
ContentHandler's
startElement method, which the parser calls at the
start of each element, passing as arguments the
element's tag name as a Unicode string and the
collection of attributes. Our override of this method counts the
number of times each tag name occurs. In the end, we extract the
dictionary used for counting and emit it (in alphabetical order,
which we easily obtain by sorting the keys).


See Also


Recipe 12.3 for other uses
of SAX.


/ 394