Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 12.9. Filtering Elements and Attributes Belonging to a Given Namespace


Credit: A.M. Kuchling


Problem


While parsing an XML document with
SAX, you need to filter out all of the elements and attributes that
belong to a particular namespace.


Solution


The SAX filter concept is just what we need here:

from xml import sax
from xml.sax import handler, saxutils, xmlreader
# the namespace we want to remove in our filter
RDF_NS = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
class RDFFilter(saxutils.XMLFilterBase):
def _ _init_ _ (self, *args):
saxutils.XMLFilterBase._ _init_ _(self, *args)
# initially, we're not in RDF, and just one stack level is needed
self.in_rdf_stack = [False]
def startElementNS(self, (uri, localname), qname, attrs):
if uri == RDF_NS or self.in_rdf_stack[-1] == True:
# skip elements with namespace, if that namespace is RDF or
# the element is nested in an RDF one -- and grow the stack
self.in_rdf_stack.append(True)
return
# Make a dict of attributes that DON'T belong to the RDF namespace
keep_attrs = { }
for key, value in attrs.items( ):
uri, localname = key
if uri != RDF_NS:
keep_attrs[key] = value
# prepare the cleaned-up bunch of non-RDF-namespace attributes
attrs = xmlreader.AttributesNSImpl(keep_attrs, attrs.getQNames( ))
# grow the stack by replicating the latest entry
self.in_rdf_stack.append(self.in_rdf_stack[-1])
# finally delegate the rest of the operation to our base class
saxutils.XMLFilterBase.startElementNS(self,
(uri, localname), qname, attrs)
def characters(self, content):
# skip characters that are inside an RDF-namespaced tag being skipped
if self.in_rdf_stack[-1]:
return
# delegate the rest of the operation to our base class
saxutils.XMLFilterBase.characters(self, content)
def endElementNS (self, (uri, localname), qname):
# pop the stack -- nothing else to be done, if we were skipping
if self.in_rdf_stack.pop( ) == True:
return
# delegate the rest of the operation to our base class
saxutils.XMLFilterBase.endElementNS(self, (uri, localname), qname)
def filter_rdf(input, output):
"" filter_rdf(input=some_input_filename, output=some_output_filename)
Parses the XML input from the input stream, filtering out all
elements and attributes that are in the RDF namespace.
""
output_gen = saxutils.XMLGenerator(output)
parser = sax.make_parser( )
filter = RDFFilter(parser)
filter.setFeature(handler.feature_namespaces, True)
filter.setContentHandler(output_gen)
filter.setErrorHandler(handler.ErrorHandler( ))
filter.parse(input)
if _ _name_ _ == '_ _main_ _':
import StringIO, sys
TEST_RDF = '''<?xml version="1.0"?>
<metadata xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<title> This is non-RDF content </title>
<rdf:RDF>
<rdf:Description rdf:about="%s">
<dc:Creator>%s</dc:Creator>
</rdf:Description>
</rdf:RDF>
<element />
</metadata>
'''
input = StringIO.StringIO(TEST_RDF)
filter_rdf(input, sys.stdout)

This module, when run as a main script, emits something like:

<?xml version="1.0" encoding="iso-8859-1"?>
<metadata xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<title> This is non-RDF content </title>
<element></element>
</metadata>


Discussion


My
motivation for originally writing this recipe came from processing
files of metadata, containing RDF mixed with other elements. I wanted
to generate a version of the metadata with the RDF filtered out.

The
filter_rdf function does the job, reading XML input
from the input stream and writing it to the output stream. The
standard XMLGenerator class in
xml.sax.saxutils is used to produce the output.
Function filter_rdf internally uses a filtering
class called RDFFilter, also shown in this
recipe's Solution, pushing that filter on top of the
XML parser to suppress elements and attributes belonging to the
RDF_NS namespace.

Non-RDF elements contained within an RDF element are also removed. To
modify this behavior, change the first line of the
startElementNS method to use just if uri
= = RDF_NS
as the guard.

This code doesn't delete the
xmlns declaration for the RDF namespace;
I'm willing to live with a little unnecessary but
harmless cruft in the output rather than go to huge trouble to remove
it.


See Also


Library Reference and Python in a
Nutshell
document the built-in XML support in the Python
Standard Library.


/ 394