22.1. Parsing XML into Data Structures
22.1.1. Problem
You want a Perl data structure (a
combination of hashes and arrays) that corresponds to the structure
and content of an XML file. For example, you have XML representing a
configuration file, and you''d like to say
$xml->{config}{server}{hostname} to access the
contents of
<config><server><hostname>...</hostname>.
22.1.2. Solution
Use the XML::Simple module from CPAN. If your XML is in a file, pass
the filename to
XMLin:
use XML::Simple;
$ref = XMLin($FILENAME, ForceArray => 1);
If your XML is in a string, pass the string to
XMLin:
use XML::Simple;
$ref = XMLin($STRING, ForceArray => 1);
22.1.3. Discussion
Here''s the data structure that XML::Simple produces from the XML in
Example 22-1:{
''book'' => {
''1'' => {
''authors'' => [
{
''author'' => [
{
''firstname'' => [ ''Larry'' ],
''lastname'' => [ ''Wall'' ]
},
{
''firstname'' => [ ''Tom'' ],
''lastname'' => [ ''Christiansen'' ]
},
{
''firstname'' => [ ''Jon'' ],
''lastname'' => [ ''Orwant'' ]
}
]
}
],
''edition'' => [ ''3'' ],
''title'' => [ ''Programming Perl'' ],
''isbn'' => [ ''0-596-00027-8'' ]
},
''2'' => {
''authors'' => [
{
''author'' => [
{
''firstname'' => [ ''Sean'' ],
''lastname'' => [ ''Burke'' ]
}
]
}
],
''edition'' => [ ''1'' ],
''title'' => [ ''Perl & LWP'' ],
''isbn'' => [ ''0-596-00178-9'' ]
},
''3'' => {
''authors'' => [ { } ],
''edition'' => [ ''1'' ],
''title'' => [ ''Anonymous Perl'' ],
''isbn'' => [ ''0-555-00178-0'' ]
},
}
}
The basic function of XML::Simple is to turn an element that contains
other elements into a hash. If there are multiple identically named
elements inside a single containing element (e.g.,
book), they become an array of hashes unless
XML::Simple knows they are uniquely identified by attributes (as
happens here with the id attribute).By default, XML::Simple assumes that if an element has an attribute
called id, name, or
key, then that attribute is a unique identifier
for the element. This is controlled by the KeyAttr
option to the XMLin function. For example, set
KeyAttr to an empty list to disable this
conversion from arrays of elements to a hash by attribute:
$ref = XMLin($xml, ForceArray => 1, KeyAttr => [ ]);
For more fine-grained control, specify a hash that maps the element
name to the attribute that holds a unique identifier. For example, to
create a hash on the id attribute of
book elements and no others, say:
$ref = XMLin($xml, ForceArray => 1, KeyAttr => { book => "id" });
The
ForceArray option creates all of those one-element
arrays in the data structure. Without it, XML::Simple compacts
one-element arrays:
''3'' => {
''authors'' => { },
''edition'' => ''1'',
''title'' => ''Anonymous Perl'',
''isbn'' => ''0-555-00178-0''
},
Although this format is easier to read, it''s also harder to program
for. If you know that no element repeats, you
can leave ForceArray off. But if some elements
repeat and some don''t, you need ForceArray to
ensure a consistent data structure. Having the data sometimes
directly available, sometimes inside an array, complicates the code.The XML::Simple module has options that control the data structure
built from the XML. Read the module''s manpage for more details. Be
aware that XML::Simple is only really useful for highly structured
data, like the kind used in configuration files. It''s awkward to use
with XML that represents documents rather than data structures, and
doesn''t let you work with XML features like processing instructions
or comments. We recommend that, for all but the most simple XML, you
look to DOM and SAX parsing for your XML parsing needs.
22.1.4. See Also
The documentation for the CPAN module XML::Simple; Recipe 22.10