
![]() | ![]() |
22.3. Parsing XML into SAX Events
22.3.1. Problem
You want to
receive Simple API for XML (SAX) events from an XML parser because
event-based parsing is faster and uses less memory than parsers that
build a DOM tree.
22.3.2. Solution
Use the XML::SAX module from CPAN:use XML::SAX::ParserFactory;
use MyHandler;
my $handler = MyHandler->new( );
my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);
$parser->parse_uri($FILENAME);
# or
$parser->parse_string($XML);
Logic for handling events goes into the handler class (MyHandler in
this example), which you write:# in MyHandler.pm
package MyHandler;
use base qw(XML::SAX::Base);
sub start_element { # method names are specified by SAX
my ($self, $data) = @_;
# $data is hash with keys like Name and Attributes
# ...
}
# other possible methods include end_element( ) and characters( )
1;
22.3.3. Discussion
An XML processor that uses SAX has three
parts: the XML parser that generates SAX events, the handler that
reacts to them, and the stub that connects the two. The XML parser
can be XML::Parser, XML::LibXML, or the pure Perl XML::SAX::PurePerl
that comes with XML::SAX. The XML::SAX::ParserFactory module selects
a parser for you and connects it to your handler. Your handler takes
the form of a class that inherits from XML::SAX::Base. The stub is
the program shown in the Solution.The
XML::SAX::Base module provides stubs for the different methods that
the XML parser calls on your handler. Those methods are listed in
Table 22-2, and are the methods defined by the SAX1
and SAX2 standards at http://www.saxproject.org/. The Perl
implementation uses more Perl-ish data structures and is described in
the XML::SAX::Intro manpage.
Table 22-2. XML::SAX::Base methods
start_document | end_document | characters |
start_element | end_element | processing_instruction |
ignorable_whitespace | set_document_locator | skipped_entity |
start_prefix_mapping | end_prefix_mapping | comment |
start_cdata | end_cdata | entity_reference |
notation_decl | unparsed_entity_decl | element_decl |
attlist_decl | doctype_decl | xml_decl |
entity_decl | attribute_decl | internal_entity_decl |
start_dtd | end_dtd | external_entity_decl |
resolve_entity | start_entity | end_entity |
warning | error | fatal_error |
elements and attributes. The $data parameter to
start_element and end_element
is a hash reference. The keys of the hash are given in Table 22-3.
Table 22-3. An XML::SAX element hash
Key | Meaning |
---|---|
Prefix | XML namespace prefix (e.g., email:) |
LocalName | Attribute name without prefix (e.g., to) |
Name | Fully qualified attribute name (e.g., email:to) |
Attributes | Hash of attributes of the element |
NamespaceURI | URI of the XML namespace for this attribute |
as
"{namespaceURI}attrname".
For example, if the current namespace URI is http://example.com/dtds/mailspec/ and the
attribute is msgid, the key in the attribute hash
is:{http://example.com/dtds/mailspec/}msgid
The attribute value is a hash; its keys are given in Table 22-4.
Table 22-4. An XML::SAX attribute hash
Key | Meaning |
---|---|
Prefix | XML namespace prefix (e.g., email:) |
LocalName | Element name without prefix (e.g., to) |
Name | Fully qualified element name (e.g., email:to) |
Value | Value of the attribute |
NamespaceURI | URI of the XML namespace for this element |
SAX events. It's more complex than the DOM solution because with SAX
we must keep track of where we are in the XML document.
Example 22-4. sax-titledumper
# in TitleDumper.pm
# TitleDumper.pm -- SAX handler to display titles in books file
package TitleDumper;
use base qw(XML::SAX::Base);
my $in_title = 0;
# if we're entering a title, increase $in_title
sub start_element {
my ($self, $data) = @_;
if ($data->{Name} eq 'title') {
$in_title++;
}
}
# if we're leaving a title, decrease $in_title and print a newline
sub end_element {
my ($self, $data) = @_;
if ($data->{Name} eq 'title') {
$in_title--;
print "\n";
}
}
# if we're in a title, print any text we get
sub characters {
my ($self, $data) = @_;
if ($in_title) {
print $data->{Data};
}
}
1;
The XML::SAX::Intro manpage provides a gentle introduction to
XML::SAX parsing.
22.3.4. See Also
Chapter 5 of Perl & XML; the documentation
for the CPAN modules XML::SAX, XML::SAX::Base, and
XML::SAX::Intro
![]() | ![]() | ![]() |
22.2. Parsing XML into a DOM Tree | ![]() | 22.4. Making Simple Changes to Elements or Text |

Copyright © 2003 O'Reilly & Associates. All rights reserved.