Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Recipe 12.5. Converting an XML Document into a Tree of Python Objects

Credit: John Bair, Christoph Dietze

Problem

You want to load an XML document into memory, but you
don't like the complicated access procedures of DOM.
You'd prefer something more
Pythonicspecifically, you'd like to map the
document into a tree of Python objects.

Solution

To build our tree of objects, we can directly wrap the fast
expat parser:

from xml.parsers import expat
class Element(object):
''' A parsed XML element '''
def _ _init_ _(self, name, attributes):
# Record tagname and attributes dictionary
self.name = name
self.attributes = attributes
# Initialize the element's cdata and children to empty
self.cdata = ''
self.children = [  ]
def addChild(self, element):
self.children.append(element)
def getAttribute(self, key):
return self.attributes.get(key)
def getData(self):
return self.cdata
def getElements(self, name=''):
if name:
return [c for c in self.children if c.name == name]
else:
return list(self.children)
class Xml2Obj(object)
''' XML to Object converter '''
def _ _init_ _(self):
self.root = None
self.nodeStack = [  ]
def StartElement(self, name, attributes):
'Expat start element event handler'
# Instantiate an Element object
element = Element(name.encode( ), attributes)
# Push element onto the stack and make it a child of parent
if self.nodeStack:
parent = self.nodeStack[-1]
parent.addChild(element)
else:
self.root = element
self.nodeStack.append(element)
def EndElement(self, name):
'Expat end element event handler'
self.nodeStack[-1].pop( )
def CharacterData(self, data):
'Expat character data event handler'
if data.strip( ):
data = data.encode( )
element = self.nodeStack[-1]
element.cdata += data
def Parse(self, filename):
# Create an Expat parser
Parser = expat.ParserCreate( )
# Set the Expat event handlers to our methods
Parser.StartElementHandler = self.StartElement
Parser.EndElementHandler = self.EndElement
Parser.CharacterDataHandler = self.CharacterData
# Parse the XML File
ParserStatus = Parser.Parse(open(filename).read( ), 1)
return self.root
parser = Xml2Obj( )
root_element = parser.Parse('sample.xml')

Discussion

I saw Christoph Dietze's recipe (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/116539)
about turning the structure of an XML document into a simple
combination of dictionaries and lists and thought it was a really
good idea. This recipe is a variation on that idea, with several
differences.

For maximum speed, the recipe uses the low-level
expat parser directly. It would get no real added
value from the richer SAX interface, much less from the slow and
memory-hungry DOM approach. Building the parent-children connections
is not hard even with an event-driven interface, as this recipe shows
by using a simple stack for the purpose.

The main difference with respect to Dietze's
original idea is that this recipe loads the XML document into a tree
of Python objects (rather than a combination of dictionaries and
lists), one per node, with nicely named attributes allowing access to
each node's characteristicstagname,
attributes (as a Python dictionary), character data (i.e.,
cdata in XML parlance) and children elements
(as a Python list).

The various accessor methods of class Element are,
of course, optional. You might prefer to access the attributes
directly. I think they add no complexity and look nicer, but,
obviously, your tastes may differ. This is, after all, just a recipe,
so feel free to alter the mix of seasonings at will!

You can find other similar ideas (e.g., bypass the DOM, build
something more Pythonic as the memory representation of an XML
document) in many other excellent and more complete projects, such as
PyRXP (http://www.reportlab.org/pyrxpl),
ElementTree (http://effbot.org/zone/element-index),
and XIST (http://www.livinglogic.de/Python/xist/).

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی