XML and PHP [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

XML and PHP [Electronic resources] - نسخه متنی

Vikram Vaswani

نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
افزودن یادداشت
افزودن یادداشت جدید

Traversing the DOM with PHP''s DOM Classes

Because PHP''s DOM parser works by creating standard objects to represent XML structures, an understanding of these objects and their capabilities is essential to using this technique effectively. This section examines the classes that form the blueprint for these objects in greater detail.

DomDocument Class

A DomDocument object is typically the first object created by the DOM parser when it completes parsing an XML document. It may be created by a call to xmldoc():

$doc = xmldoc("<?xml version=''1.0''?><element
>potassium</element>"); 

Or, if your XML data is in a file (rather than a string), you can use the xmldocfile() function to create a DomDocument object:

$doc = xmldocfile("element.xml"); 

Treading the Right Path

If you''re using Windows, you''ll need to give xmldocfile() the full path to the XML file. Don''t forget to include the drive letter!

When you examine the structure of the DomDocument object with print_r(),you can see that it contains basic information about the XML documentincluding the XML version, the encoding and character set, and the URL of the document:

DomDocument Object 
(
[name] => 
[url] => 
[version] => 1.0 
[standalone] => -1 
[type] => 9 
[compression] => -1 
[charset] => 1 
) 

Peekaboo!

You''ll notice that many examples in this book (particularly in this chapter) use the print_r() function to display the structure of a particular PHP variable. In case you''re not familiar with this function, you should know that it provides an easy way to investigate the innards of a particular variable, array, or object. Use it whenever you need to look inside an object to see what makes it tick; and, if you''re feeling really adventurous, you might also want to take a look at the var_dump() and var_export() functions, which provide similar functionality.

Each of these properties provides information on some aspect of the XML document:

name Name of the XML document

url URL of the document

version XML version used

standalone Whether or not the document is a standalone document

type Integer corresponding to one of the DOM node types (see Table 3.1)

compression Whether or not the file was compressed

charset Character set used by the document

The application can use this information to make decisions about how to process the XML datafor example, as Listing 3.3 demonstrates, it may reject documents based on the version of XML being used.

Listing 3.3 Using DomDocument Properties to Verify XML Version Information

<?php 
// XML data 
$xml_string = "<?xml version=''1.0''?
><element>potassium</element>"; 
// create a DOM object 
if (!$doc = xmldoc($xml_string)) 
{
die("Error in XML"); 
} 
// version check 
else if ($doc->version > 1.0) 
{
die("Unsupported XML version"); 
} 
else 
{
// XML processing code here 
} 
?> 

In addition to the properties described previously, the DomDocument object also comes with the following methods:

root() Returns a DomElement object representing the document element

dtd() Returns a DTD object containing information about the document''s DTD

add_root() Creates a new document element, and returns a DomElement object representing that element

dumpmem() Dumps the XML structure into a string variable

xpath_new_context() Creates an XPathContext object for XPath evaluation

While parsing XML data, you''ll find that the root() method is the one you use most often, whereas the add_root() and dumpmem() methods come in handy when you''re creating or modifying an XML document tree in memory (discussed in detail in the "Manipulating DOM Trees" section).

X Marks the Spot

In case you''re wondering, XPath, or the XML Path Language, provides an easy way to address specific parts of an XML document. The language uses directional axes, coupled with conditional tests, to create node collections matching a specific criterion, and also provides standard constructs to manipulate these collections.

PHP''s XPath implementation is discussed in detail in the upcoming section titled "Traversing the DOM with PHP''s XPath Classes."

In Listing 3.4, the variable $fruit contains the root node (the element named fruit).

Listing 3.4 Accessing the Document Element via the DOM

<?php 
// create a DomDocument object 
$doc = xmldoc("<?xml version=''1.0'' 
encoding=''UTF-8'' standalone=''yes''?><fruit>watermelon</
fruit>"); 
// root node 
$fruit = $doc->root(); 
?> 

To DTD or Not to DTD

The dtd() method of the DomDocument object creates a DTD object, which contains basic information about the document''s Document Type Definition. Here''s what it looks like:

Dtd Object 
(
[systemId] => weather.dtd 
[name] => weather 
) 

This DTD object exposes two properties: the systemId property reveals the filename of the DTD document, whereas the name property contains the name of the document element.

DomElement Class

The PHP parser represents every element within the XML document as an instance of the DomElement class, which makes it one of the most important in this lineup. When you view the structure of a DomElement object, you see that it has two distinct properties that represent the element name and type, respectively.You''ll remember from Listing 3.2 that these properties can be used to identify individual elements and extract their values. Here is an example:

DomElement Object 
(
[type] => 1 
[tagname] => vegetable 
) 

A special note should be made here of the type property, which indicates the type of node under discussion. This type property contains an integer value mapping to one of the parser''s predefined node types. Table 3.1 lists the important types.

Table 3.1. DOM Node Types

Integer

Node type

Description

1

XML_ELEMENT_NODE

Element

2

XML_ATTRIBUTE_NODE

Attribute

3

XML_TEXT_NODE

Text

4

XML_CDATA_SECTION_NODE

CDATA section

5

XML_ENTITY_REF_NODE

Entity reference

7

XML_PI_NODE

Processing instruction

8

XML_COMMENT_NODE

Comment

9

XML_DOCUMENT_NODE

XML document

12

XML_NOTATION_NODE

Notation

If you plan to use the type property within a script to identify node types (as I will be doing shortly in Listing 3.5), you should note that it is considered preferable to use the named constants rather than their corresponding integer values, both for readability and to ensure stability across API changes.

The DomElement object also exposes a number of useful object methods:

children() Returns an array of DomElement objects representing the children of this node

parent() Returns a DomElement object representing the parent of this node

attributes() Returns an array of DomAttribute objects representing the attributes of this node

get_attribute() Returns the value of an attribute of this node

new_child() Creates a new DomElement object, and attaches it as a child of this node (note that this newly created node is placed at the end of the existing child list)

set_attribute() Sets the value of an attribute of this node

set_content() Sets the content of this node

Again, the two most commonly used ones are the children() and attributes() methods, which return an array of DomElement and DomAttribute objects, respectively. The get_attribute() method can be used to return the value of a specific attribute of an element (refer to Manipulating DOM Trees."

Note that PHP''s DOM implementation does not currently offer any way of removing an attribute previously set with the set_attribute() method.

Choices

Most of the object methods discussed in this chapter can also be invoked as functions by prefixing the method name with domxml and passing a reference to the object as the first function argument. The following snippets demonstrate this:

<?php 
// these two are equivalent 
$root1 = $doc->root(); 
$root2 = domxml_root($doc); 
// these two are equivalent 
$children1 = $root1->children(); 
$children2 = domxml_children($root2); 
?> 

Listing 2.5). At the end of the process, a count of the total number of elements encountered is displayed.

Listing 3.5 Representing an XML Document as a Hierarchical List

<?php 
// XML file 
$xml_file = "letter.xml"; 
// parse it 
if (!$doc = xmldocfile($xml_file)) 
{
die("Error in XML document"); 
} 
// get the root node 
$root = $doc->root(); 
// get its children 
$children = get_children($root); 
// element counter 
// start with 1 so as to include document element 
$elementCount = 1; 
// start printing 
print_tree($children); 
// this recursive function accepts an array of nodes as argument, 
// iterates through it and prints a list for each element found 
function print_tree($nodeCollection) 
{  
global $elementCount; 
// iterate through array 
echo "<ul>"; 
for ($x=0; $x<sizeof($nodeCollection); $x++) 
{
// add to element count 
$elementCount++; 
// print element as list item 
echo "<li>" . $nodeCollection[$x]->tagname; 
// go to the next level of the tree 
$nextCollection = get_children($nodeCollection[$x]); 
// recurse! 
print_tree($nextCollection); 
} 
echo "</ul>"; 
} 
// function to return an array of children, given a parent node 
function get_children($node) 
{
$temp = $node->children(); 
$collection = array(); 
// iterate through children array 
for ($x=0; $x<sizeof($temp); $x++) 
{
// filter out all nodes except elements 
// and create a new array 
if ($temp[$x]->type == XML_ELEMENT_NODE) 
{
$collection[] = $temp[$x]; 
} 
} 
// return array containing child nodes 
return $collection; 
} 
echo "Total number of elements in document: $elementCount"; 
?> 

Listing 3.5 is fairly easy to understand. The first step is to obtain a reference to the root of the document tree via the root() method; this reference serves as the starting point for the recursive print_tree() function. This function obtains a reference to the children of the root node, processes them, and then calls itself again to process the next level of nodes in the tree. The process continues until all the nodes in the tree have been exhausted. An element counter is used to track the number of elements found, and to display a total count of all the elements in the document.

DomText Class

Character data within an XML document is represented by the DomText class. Here''s what it looks like:

DomText Object 
(
[type] => 3 
[content] => cabbages 
) 

The type property represents the node type (XML_TEXT_NODE in this case, as can be seen from Table 3.1), whereas the content property holds the character data itself. In order to illustrate this, consider Listing 3.6, which takes an XML-encoded list of country names, parses it, and puts that list into a PHP array.

Listing 3.6 Using DomText Object Properties to Retrieve Character Data from an XML Document

<?php 
// XML data 
$xml_string = "<?xml version=''1.0''?> 
<earth> 
<country>Albania</country> 
<country>Argentina</country> 
<!-- and so on --> 
<country>Zimbabwe</country> 
</earth>"; 
// create array to hold country names 
$countries = array(); 
// create a DOM object from the XML data 
if(!$doc = xmldoc($xml_string)) 
{
die("Error parsing XML"); 
} 
// start at the root 
$root = $doc->root();   
// move down one level to the root''s children 
$nodes = $root->children(); 
// iterate through the list of children 
foreach ($nodes as $n) 
{
// for each <country> element 
// get the text node under it 
// and add it to the $countries[] array 
$text = $n->children(); 
if ($text[0]->content != ") 
{
$countries[] = $text[0]->content; 
} 
} 
// uncomment this line to see the contents of the array 
// print_r($countries); 
?> 

Fairly simplea loop is used to iterate through all the <country> elements, adding the character data found within each to the global $countries array.

Taking up Space

It''s important to remember that XML, unlike HTML, does not ignore whitespace, but treats it as literal character data. Consequently, if your XML document includes whitespace or line breaks, PHP''s DOM parser identifies them as text nodes, and creates DomText objects to represent them. This is a common cause of confusion for DOM newbies, who are often stumped by the "extra" nodes that appear in their DOM tree.

DomAttribute Class

A call to the attributes() method of the DomElement object generates an array of DomAttribute objects, each of which looks like this:

DomAttribute Object 
(
[name] => color 
[value] => green 
) 

The attribute name can be accessed via the name property, and the corresponding attribute value can be accessed via the value property. Listing 3.7 demonstrates how this works by using the value of the color attribute to highlight each vegetable or fruit name in the corresponding color.

Listing 3.7 Accessing Attribute Values with the DomAttribute Object

<?php 
// XML data 
$xml_string = "<?xml version=''1.0''?> 
<sentence> 
What a wonderful profusion of colors and smells
 in the market - <vegetable 
color=''green''>cabbages</vegetable>, 
<vegetable color=''red''>tomatoes</vegetable>, 
<fruit color=''green''
>apples</fruit>, <vegetable 
color=''purple''>aubergines</vegetable>, 
<fruit color=''yellow''>bananas</fruit> 
</sentence>"; 
// parse it 
if (!$doc = xmldoc($xml_string)) 
{
die("Error in XML document"); 
} 
// get the root node 
$root = $doc->root(); 
// get its children 
$children = $root->children(); 
// iterate through child list 
for ($x=0; $x<sizeof($children); $x++) 
{
// if element node 
if ($children[$x]->type == XML_ELEMENT_NODE) 
{
// get the text node under it 
$text = $children[$x]->children(); 
$cdata = $text[0]->content; 
// check its attributes to see if "color" is present 
$attributes = $children[$x]->attributes(); 
if (is_array($attributes) && ($index = 
is_color_attribute_present($attributes))) 
{
// if it is, colorize the element content 
echo "<font color=" . $index . ">" . $cdata . "</font>"; 
} 
else 
{
// else print it as is 
echo $cdata; 
} 
} 
// if text node  
else if ($children[$x]->type == XML_TEXT_NODE) 
{
// simply print the content 
echo $children[$x]->content; 
} 
} 
// function to iterate through attribute list 
// and return the value of the "color" attribute if available 
function is_color_attribute_present($attributeList) 
{
foreach($attributeList as $attrib) 
{
if ($attrib->name == "color") 
{
$color = $attrib->value; 
break; 
} 
} 
return $color; 
} 
?> 

There is, of course, a simpler way to do thisjust use the DomElement object''s get_attribute() method. Listing 3.8, which generates equivalent output to Listing 3.7, demonstrates this alternative (and much shorter) approach.

Listing 3.8 Accessing Attribute Values (a Simpler Approach)

<?php 
// XML data 
$xml_string = "<?xml version=''1.0''?> 
<sentence> 
What a wonderful profusion of colors 
and smells in the market - <vegetable 
color=''green''>cabbages</vegetable>, 
<vegetable color=''red''
>tomatoes</vegetable>, 
<fruit color=''green''>apples
</fruit>, <vegetable 
color=''purple''>aubergines
</vegetable>, <fruit color=''yellow''>bananas</fruit> 
</sentence>"; 
// parse it 
if (!$doc = xmldoc($xml_string)) 
{
die("Error in XML document"); 
} 
// get the root node 
$root = $doc->root(); 
// get its children 
$children = $root->children(); 
// iterate through child list 
for ($x=0; $x<sizeof($children); $x++) 
{
// if element node 
if ($children[$x]->type == XML_ELEMENT_NODE) 
{
// get the text node under it 
$text = $children[$x]->children(); 
$cdata = $text[0]->content; 
// check to see if element contains the "color" attribute 
if ($children[$x]->get_attribute("color")) 
{
// "color" attribute is present, colorize text 
echo "<font color=" . $children
[$x]->get_attribute("color") . ">" . $cdata 
. "</font>"; 
} 
else 
{
// otherwise just print the text as is 
echo $cdata; 
} 
} 
// if text node 
else if ($children[$x]->type == XML_TEXT_NODE) 
{
// print content as is 
echo $children[$x]->content; 
} 
} 
?> 

A Composite Example

Now that you know how it works, how about seeing how it plays out in real life? This example takes everything you learned thus far, and uses that knowledge to construct an HTML file from an XML document.

I''ll be using a variant of the XML invoice (Listing 2.21) from Chapter 2, adapting the SAX-based approach demonstrated there to the new DOM paradigm. As you''ll see, although the two techniques are fundamentally different, they can nonetheless achieve a similar effect. Listing 3.9 is the marked-up invoice.

Listing 3.9 An XML Invoice (invoice.xml)

<?xml version="1.0"?> 
<invoice> 
<customer> 
<name>Joe Wannabe</name> 
<address> 
<line>23, Great Bridge Road</line> 
<line>Bombay, MH</line> 
<line>India</line> 
</address> 
</customer> 
<date>2001-09-15</date> 
<reference>75-848478-98</reference> 
<items> 
<item cid="AS633225"> 
<desc>Oversize tennis racquet</desc> 
<price>235.00</price> 
<quantity>1</quantity> 
<subtotal>235.00</subtotal> 
</item> 
<item cid="GT645"> 
<desc>Championship tennis balls (can)</desc> 
<price>9.99</price> 
<quantity>4</quantity> 
<subtotal>39.96</subtotal> 
</item> 
<item cid="U73472"> 
<desc>Designer gym bag</desc> 
<price>139.99</price> 
<quantity>1</quantity> 
<subtotal>139.99</subtotal> 
</item> 
<item cid="AD848383"> 
<desc>Custom-fitted sneakers</desc> 
<price>349.99</price> 
<quantity>1</quantity> 
<subtotal>349.99</subtotal> 
</item> 
</items> 
<delivery>Next-day air</delivery> 
</invoice> 

Listing 3.10 parses the previous XML data to create an HTML page, suitable for printing or viewing in a browser.

Listing 3.10 Formatting an XML Document with the DOM

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body bgcolor="white"> 
<font size="+3">Sammy''s Sports Store</font> 
<br> 
<font size="-2">14, Ocean View,
 CA 12345, USA http://www.sammysportstore.com/</font> 
<p> 
<hr> 
<center>INVOICE</center> 
<hr> 
<?php 
// arrays to associate XML elements with HTML output 
$startTagsArray = array(
''CUSTOMER'' => ''<p> <b>Customer: </b>'', 
''ADDRESS'' => ''<p> <b>Billing address: </b>'', 
''DATE'' => ''<p> <b>Invoice date: </b>'', 
''REFERENCE'' => ''<p> <b
>Invoice number: </b>'', 
''ITEMS'' => ''<p> <b>Details: 
</b> <table width="100%" border="1" cellspacing="0" 
cellpadding="3"><tr><td><b>Item description
</b></td><td><b>Price</b></td><td><b> 
Quantity</b></td><td><b>Sub-total</b></td></tr>'', 
''ITEM'' => ''<tr>'', 
''DESC'' => ''<td>'', 
''PRICE'' => ''<td>'', 
''QUANTITY'' => ''<td>'', 
''SUBTOTAL'' => ''<td>'', 
''DELIVERY'' => ''<p> <b>Shipping option:</b> '', 
''TERMS'' => ''<p> <b>Terms and conditions: </b> <ul>'', 
''TERM'' => ''<li>'' 
); 
$endTagsArray = array(
''LINE'' => '','', 
''ITEMS'' => ''</table>'', 
''ITEM'' => ''</tr>'', 
''DESC'' => ''</td>'', 
''PRICE'' => ''</td>'', 
''QUANTITY'' => ''</td>'', 
''SUBTOTAL'' => ''</td>'', 
''TERMS'' => ''</ul>'', 
''TERM'' => ''</li>'' 
);   
// array to hold sub-totals 
$subTotals = array(); 
// XML file 
$xml_file = "/home/sammy/invoices/invoice.xml"; 
// parse document 
$doc = xmldocfile($xml_file); 
// get the root node 
$root = $doc->root(); 
// get its children 
$children = $root->children(); 
// start printing 
print_tree($children); 
// this recursive function accepts an array of nodes as argument, 
// iterates through it and: 
//      - marks up elements with HTML 
//      - prints text as is 
function print_tree($nodeCollection) 
{
global $startTagsArray, $endTagsArray, $subTotals; 
foreach ($nodeCollection as $node) 
{
// how to handle elements 
if ($node->type == XML_ELEMENT_NODE) 
{
// print HTML opening tags 
echo $startTagsArray[strtoupper($node->tagname)]; 
// recurse 
$nextCollection = $node->children(); 
print_tree($nextCollection); 
// once done, print closing tags 
echo $endTagsArray[strtoupper($node->tagname)]; 
} 
// how to handle text nodes 
if ($node->type == XML_TEXT_NODE) 
{
// print text as is 
echo($node->content); 
} 
// PI handling code would come here 
// this doesn''t work too well in PHP 4.1.1 
// see the sidebar entitled "Process Failure" 
// for more information 
} 
} 
// this function gets the character data within an element 
// it accepts an element node as argument 
// and dives one level deeper into the DOM tree 
// to retrieve the corresponding character data 
function getNodeContent($node) 
{
$content = "; 
$children = $node->children(); 
if ($children) 
{
foreach ($children as $child) 
{
$content .= $child->content; 
} 
} 
return $content; 
} 
?> 

Figure 3.2 shows what the output looks like.

Figure 3.2. Sammy''s Sports Store invoice.

As with the SAX example (refer to Listing 2.23), the first thing to do is define arrays to hold the HTML markup for specific tags; in Listing 3.10, this markup is stored in the $startTagsArray and $endTagsArray variables.

Next, the XML document is read by the parser, and an appropriate DOM tree is generated in memory. An array of objects representing the first level of the treethe children of the root nodeis obtained and the function print_tree() is called. This print_tree() function is a recursive function, and it forms the core of the script.

The print_tree() function accepts a node list as argument, and iterates through this list, examining each node and processing it appropriately. As you can see, the function is set up to perform specific tasks, depending on the type of node:

If the node is an element, the function looks up the $startTagsArray and $endTagsArray variables, and prints the corresponding HTML markup.

If the node is a text node, the function simply prints the contents of the text node as is.

Additionally, if the node is an element, the print_tree() function obtains a list of the element''s childrenif any existand proceeds to call itself with that node list as argument. And so the process repeats itself until the entire tree has been parsed.

As Listing 3.10 demonstrates, this technique provides a handy way to recursively scan through a DOM tree and perform different actions based on the type of node encountered.You can use this technique to count, classify, and process the different types of elements encountered (Listing 3.5 demonstrated a primitive element counter); or even construct a new tree from the existing one.

Process Failure

If you''ve been paying attention, you will have noticed that the XML invoice in Listing 2.21. Listing 2.21 included an additional processing instruction (PI), a call to the PHP function displayTotal(), which is missing in

/ 84