
![]() | ![]() |
22.5. Validating XML
22.5.1. Problem
You
want to ensure that the XML you're processing conforms to a DTD or
XML Schema.
22.5.2. Solution
To validate against a DTD, use the XML::LibXML
module:use XML::LibXML;
my $parser = XML::LibXML->new;
$parser->validation(1);
$parser->parse_file($FILENAME);
To validate against a W3C Schema, use the XML::Xerces
module:use XML::Xerces;
my $parser = XML::Xerces::DOMParser->new;
$parser->setValidationScheme($XML::Xerces::DOMParser::Val_Always);
my $error_handler = XML::Xerces::PerlErrorHandler->new( );
$parser->setErrorHandler($error_handler);
$parser->parse($FILENAME);
22.5.3. Discussion
The
libxml2 library, upon which XML::LibXML is
based, can validate as it parses. The validation
method on the parser enables this option. At the time of this
writing, XML::LibXML could only validate with DOM
parsing—validation is not available with SAX-style parsing.Example 22-7 is a DTD for the
books.xml file in Example 22-1.
Example 22-7. validating-booksdtd
<!ELEMENT books (book*)>
<!ELEMENT book (title,edition,authors,isbn)>
<!ELEMENT authors (author*)>
<!ELEMENT author (firstname,lastname)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT edition (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ATTLIST book
id CDATA #REQUIRED
>
To make XML::LibXML parse the DTD, add this line to the
books.xml file:<!DOCTYPE books
SYSTEM "books.dtd">
Example 22-8 is a simple driver used to parse and
validate.
Example 22-8. validating-bookchecker
#!/usr/bin/perl -w
# bookchecker - parse and validate the books.xml file
use XML::LibXML;
$parser = XML::LibXML->new;
$parser->validation(1);
$parser->parse_file("books.xml");
When the document validates, the program produces no
output—XML::LibXML successfully parses the document into a DOM
structure that is quietly destroyed when the program ends. Edit the
books.xml file, however, and you see the errors
the XML::LibXML emits when it discovers broken XML.For example, changing the id attribute to
unique_id causes this error message:'books.xml:0: validity error: No declaration for attribute unique_id
of element book
<book unique_id="1">
^
books.xml:0: validity error: Element book does not carry attribute id
</book>
^
' at /usr/local/perl5-8/Library/Perl/5.8.0/darwin/XML/LibXML.pm line
405.
at checker-1 line 7
XML::LibXML does a good job of reporting unknown attributes and tags.
However, it's not so good at reporting out-of-order elements. If you
return books.xml to its correct state, and then
swap the order of a title and an
edition element, you get this message:'books.xml:0: validity error: Element book content does not follow the
DTD
</book>
^
' at /usr/local/perl5-8/Library/Perl/5.8.0/darwin/XML/LibXML.pm line
405.
at checker-1 line 7
In this case, XML::LibXML says that something in the
book element didn't follow the DTD, but it
couldn't tell us precisely what it violated in the DTD or how.At the time of this writing, you must use XML::Xerces to validate
while using SAX, or to validate against W3C Schema. Both of these
features (and RelaxNG validation) are planned for XML::LibXML, but
weren't available at the time of printing.Here's how you build a DOM tree
while validating a DTD using XML::Xerces:use XML::Xerces;
# create a new parser that always validates
my $p = XML::Xerces::DOMParser->new( );
$p->setValidationScheme($XML::Xerces::DOMParser::Val_Always);
# make it die when things fail to parse
my $error_handler = XML::Xerces::PerlErrorHandler->new( );
$p->setErrorHandler($error_handler);
$p->parse($FILENAME);
To validate against a schema, you must tell XML::Xerces where the
schema is and that it should be used:$p->setFeature("http://xml.org/sax/features/validation", 1);
$p->setFeature("http://apache.org/xml/features/validation/dynamic", 0);
$p->setFeature("http://apache.org/xml/features/validation/schema", $SCHEMAFILE);
You can pass three possible values to
setValidationScheme:$XML::Xerces::DOMParser::Val_Always
$XML::Xerces::DOMParser::Val_Never
$XML::Xerces::DOMParser::Val_Auto
The default is to never validate. Always validating raises an error
if the file does not have a DTD or Schema. Auto raises an error only
if the file has a DTD or Schema, but it fails to validate against
that DTD or Schema.XML::Xerces requires the Apache Xerces C++ XML parsing library,
available from http://xml.apache.org/xerces-c. At the time
of writing, the XML::Xerces module required an archived, older
version of the Xerces library (1.7.0) and was appallingly lacking in
documentation—you can learn how it works only by reading the
documentation for the C++ library and consulting the examples in the
samples/ directory of the XML::Xerces
distribution.
22.5.4. See Also
The documentation for the CPAN module XML::LibXML; http://xml.apache.org/xerces-c; http://xml.apache.org/xerces-p/
![]() | ![]() | ![]() |
22.4. Making Simple Changes to Elements or Text | ![]() | 22.6. Finding Elements and Text Within an XML Document |

Copyright © 2003 O'Reilly & Associates. All rights reserved.