Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید



22.6. Finding Elements and Text Within an XML Document


22.6.1. Problem


You want to get to a
specific part of the XML; for example, the href
attribute of an a tag whose contents are an
img tag with alt text
containing the word "monkey".

22.6.2. Solution


Use XML::LibXML and construct an XPath expression to find nodes
you're interested in:

use XML::LibXML;
my $parser = XML::LibXML->new;
$doc = $parser->parse_file($FILENAME);
my @nodes = $doc->findnodes($XPATH_EXPRESSION);

22.6.3. Discussion


Example 22-9 shows how you would print all the titles
in the book XML from Example 22-1.

Example 22-9. xpath-1


#!/usr/bin/perl -w
use XML::LibXML;
my $parser = XML::LibXML->new;
$doc = $parser->parse_file("books.xml");
# find title elements
my @nodes = $doc->findnodes("/books/book/title");
# print the text in the title elements
foreach my $node (@nodes) {
print $node->firstChild->data, "\n";
}


The
difference between DOM's getElementsByTagName and
findnodes is that the former identifies elements
only by their name. An XPath expression specifies a set of steps that
the XPath engine takes to find nodes you're interested in. In
Example 22-9 the XPath expression says "start at the top of
the document, go into the books element, go into
the book element, and then go into the
title element."

The difference is important. Consider this XML document:

<message>
<header><to>Tom</to><from>Nat</from></header>
<body>
<order><to>555 House St, Mundaneville</to>
<product>Fish sticks</product>
</order>
</body>
</message>

There are two to elements here: one in the header
and one in the body. If we said
$doc->getElementsByTagName("to"), we'd get both
to elements. The XPath expression
"/message/header/to" restricts output to the
to element in the header.

XPath
expressions are like regular expressions that operate on XML
structure instead of text. As with regular expressions, there are a
lot of things you can specify in XPath expressions—far more
than the simple "find this child node and go into it" that we've been
doing.

Let's return to the books file and add another entry:

<book id="4">
<!-- Perl Cookbook -->
<title>Perl Cookbook</title>
<edition>2</edition>
<authors>
<author>
<firstname>Nathan</firstname>
<lastname>Torkington</lastname>
</author>
<author>
<firstname>Tom</firstname>
<lastname>Christiansen</lastname>
</author>
</authors>
<isbn>123-345-678-90</isbn>
</book>

To identify all books by Tom Christiansen, we need simply say:

my @nodes = $doc->findnodes("/books/book/authors/author/
firstname[text( )='Tom']/../
lastname[text( )='Christiansen']/
../../../title/text( )");
foreach my $node (@nodes) {
print $node->data, "\n";
}

We find the author with
firstname equal to "Tom" and
lastname equal to
"Christiansen", then back out to the
"title" element and get its text child nodes.
Another way to write the backing out is "head out until you find the
book element again":

my @nodes = $doc->findnodes("/books/book/authors/author/
firstname[text( )='Tom']/../
lastname[text( )='Christiansen']/
ancestor::book/title/text( )");

XPath is a very powerful system, and we haven't begun to touch the
surface of it. For details on XPath, see XPath and
XPointer
, by John E. Simpson (O'Reilly), or the W3C
specification at
http://www.w3.org/TR/xpath. Advanced users
should look at the XML::LibXML::XPathContext module (also available
from CPAN), which lets you write your own XPath functions in
Perl.

22.6.4. See Also


The documentation for the modules XML::LibXML and
XML::LibXML::XPathContext;
http://www.w3.org/TR/xpath; XPath
and XPointer



22.5. Validating XML22.7. Processing XML Stylesheet Transformations




Copyright © 2003 O'Reilly & Associates. All rights reserved.

/ 875