
![]() | ![]() |
22.4. Making Simple Changes to Elements or Text
22.4.1. Problem
You want to filter some XML. For example,
you want to make substitutions in the body of a document, or add a
price to every book described in an XML document, or you want to
change <book id="1"> to
<book> <id>1</id>.
22.4.2. Solution
Use the XML::SAX::Machines module from
CPAN:#!/usr/bin/perl -w
use MySAXFilter1;
use MySAXFilter2;
use XML::SAX::ParserFactory;
use XML::SAX::Machines qw(Pipeline);
my $machine = Pipeline(MySAXFilter1 => MySAXFilter2); # or more
$machine->parse_uri($FILENAME);
Write a handler, inheriting from XML::SAX::Base as in Recipe 22.3, then whenever you need a SAX event, call the
appropriate handler in your superclass. For example:$self->SUPER::start_element($tag_struct);
22.4.3. Discussion
A SAX filter accepts SAX events and triggers new ones. The
XML::SAX::Base module detects whether your handler object is called
as a filter. If so, the XML::SAX::Base methods pass the SAX events
onto the next filter in the chain. If your handler object is not
called as a filter, then the XML::SAX::Base methods consume events
but do not emit them. This makes it almost as simple to write events
as it is to consume them.The XML::SAX::Machines module chains the filters for you. Import its
Pipeline function, then say:my $machine = Pipeline(Filter1 => Filter2 => Filter3 => Filter4);
$machine->parse_uri($FILENAME);
SAX events triggered by parsing the XML file go to Filter1, which
sends possibly different events to Filter2, which in turn sends
events to Filter3, and so on to Filter4. The last filter should print
or otherwise do something with the incoming SAX events. If you pass a
reference to a typeglob, XML::SAX::Machines writes the XML to the
filehandle in that typeglob.Example 22-5 shows a filter that turns the
id attribute in book elements
from the XML document in Example 22-1 into a new
id element. For example, <book
id="1"> becomes
<book><id>1</id>.
Example 22-5. filters-rewriteids
package RewriteIDs;
# RewriteIDs.pm -- turns "id" attributes into elements
use base qw(XML::SAX::Base);
my $ID_ATTRIB = "{ }id"; # the attribute hash entry we're interested in
sub start_element {
my ($self, $data) = @_;
if ($data->{Name} eq 'book') {
my $id = $data->{Attributes}{$ID_ATTRIB}{Value};
delete $data->{Attributes}{$ID_ATTRIB};
$self->SUPER::start_element($data);
# make new element parameter data structure for the <id> tag
my $id_node = { };
%$id_node = %$self;
$id_node->{Name} = 'id'; # more complex if namespaces involved
$id_node->{Attributes} = { };
# build the <id>$id</id>
$self->SUPER::start_element($id_node);
$self->SUPER::characters({ Data => $id });
$self->SUPER::end_element($id_node);
} else {
$self->SUPER::start_element($data);
}
}
1;
Example 22-6 is the stub that uses XML::SAX::Machines
to create the pipeline for processing books.xml
and print the altered XML.
Example 22-6. filters-rewriteprog
#!/usr/bin/perl -w
# rewrite-ids -- call RewriteIDs SAX filter to turn id attrs into elements
use RewriteIDs;
use XML::SAX::Machines qw(:all);
my $machine = Pipeline(RewriteIDs => *STDOUT);
$machine->parse_uri("books.xml");
The output of Example 22-6 is as follows (truncated
for brevity):<book><id>1</id>
<title>Programming Perl</title>
...
<book><id>2</id>
<title>Perl & LWP</title>
...
To save the XML to the file new-books.xml, use
the XML::SAX::Writer module:#!/usr/bin/perl -w
use RewriteIDs;
use XML::SAX::Machines qw(:all);
use XML::SAX::Writer;
my $writer = XML::SAX::Writer->new(Output => "new-books.xml");
my $machine = Pipeline(RewriteIDs => $writer);
$machine->parse_uri("books.xml");
You can also pass a scalar reference as the Output
parameter to have the XML appended to the scalar; as an array
reference to have the XML appended to the array, one array element
per SAX event; or as a filehandle to have the XML printed to that
filehandle.
22.4.4. See Also
The documentation for the modules XML::SAX::Machines and
XML::SAX::Writer
![]() | ![]() | ![]() |
22.3. Parsing XML into SAX Events | ![]() | 22.5. Validating XML |

Copyright © 2003 O'Reilly & Associates. All rights reserved.