XML and PHP [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

XML and PHP [Electronic resources] - نسخه متنی

Vikram Vaswani

نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
افزودن یادداشت
افزودن یادداشت جدید

Handling SAX Events

Let's move on to a more focused discussion of the various event handlers you can register with the parser.

PHP includes handlers for elements and attributes, character data, processing instructions, external entities, and notations. Each of these is discussed in detail in the following sections.

Handling Elements

The xml_set_element_handler() function is used to identify the functions that handle elements encountered by the XML parser as it progresses through a document. This function accepts three arguments: the handle for the XML parser, the name of the function to call when it finds an opening tag, and the name of the function to call when it finds a closing tag, respectively.

Here's an example:

xml_set_element_handler($xml_parser, 
"startElementHandler", "endElementHandler"); 

In this case, I've told the parser to call the function startElementHandler() when it finds an opening tag and the function endElementHandler() when it finds a closing tag.

These handler functions must be set up to accept certain basic information about the element generating the event.

When PHP calls the start tag handler, it passes it the following three arguments:

A handle representing the XML parser

The name of the element

A list of the element's attributes (as an associative array)

Because closing tags do not contain attributes, the end tag handler is only passed two arguments:

A handle representing the XML parser

The element name

In order to demonstrate this, consider Listing 2.4 a simple XML document.

Listing 2.4 Letter Marked Up with XML (letter.xml)

<?xml version="1.0"?> 
<letter> 
<date>10 January 2001</date> 
<salutation> 
<para> 
Dear Aunt Hilda, 
</para> 
</salutation> 
<body> 
<para> 
Just writing to thank you for the wonderful
 train set you sent me for 
Christmas. I like it very much, and Sarah and
 I have both enjoyed playing 
with it over the long holidays. 
</para> 
<para> 
It has been a while since you visited us.
 How have you been? How are the 
dogs, and has the cat stopped playing 
with your knitting yet? We were hoping 
to come by for a short visit on New
 Year's Eve, but Sarah wasn't feeling 
well. However, I hope to see you next
 month when I will be home from school 
for the holidays. 
</para> 
</body> 
<conclusion> 
<para>Hugs and kisses -- 
Your nephew, Tom</para> 
</conclusion> 
</letter> 

Listing 2.5 uses element handlers to create an indented list mirroring the hierarchical structure of the XML document in Listing 2.4.

Listing 2.5 Representing an XML Document as a Hierarchical List

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body> 
<?php 
// run when start tag is found 
function startElementHandler($parser, $name, $attributes) 
{
echo "<ul><li>$name</li>"; 
}   
function endElementHandler($parser, $name) 
{
echo "</ul>"; 
} 
// XML data file 
$xml_file = "letter.xml"; 
// initialize parser 
$xml_parser = xml_parser_create(); 
// set element handler 
xml_set_element_handler($xml_parser, 
"startElementHandler", "endElementHandler"); 
// read XML file 
if (!($fp = fopen($xml_file, "r"))) 
{
die("File I/O error: $xml_file"); 
} 
// parse XML 
while ($data = fread($fp, 4096)) 
{
// error handler 
if (!xml_parse($xml_parser, $data, feof($fp))) 
{
die("XML parser error: " . 
xml_error_string(xml_get_error_code($xml_parser))); 
} 
} 
// all done, clean up! 
xml_parser_free($xml_parser); 
?> 
</body> 
</html> 

Each time the parser finds an opening tag, it creates an unordered list and adds the tag name as the first item in that list; each time it finds an ending tag, it closes the list. The result is a hierarchical representation of the XML document's structure.

Handling Character Data

The xml_set_character_data_handler() registers event handlers for character data. It accepts two arguments: the handle for the XML parser and the name of the function to call when it finds character data.

For example:

xml_set_character_data_handler
($xml_parser, "characterDataHandler"); 

This tells the SAX parser to use the function named characterDataHandler() to process character data.

When PHP calls this function, it automatically passes it the following two arguments:

A handle representing the XML parser

The character data found

Listing 2.6 demonstrates how this could be used.

Listing 2.6 Stripping Out Tags from an XML Document

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body> 
<?php 
// cdata handler 
function characterDataHandler($parser, $data) 
{
echo $data; 
} 
// XML data 
$xml_data = <<<EOF 
<?xml version="1.0"?> 
<grammar> 
<noun type="proper">Mary
</noun> <verb tense="past">had</verb> a 
<adjective>little</adjective>
 <noun type="common">lamb.</noun> 
</grammar> 
EOF; 
// initialize parser 
$xml_parser = xml_parser_create(); 
// set cdata handler 
xml_set_character_data_handler
($xml_parser, "characterDataHandler"); 
if (!xml_parse($xml_parser, $xml_data)) 
{
die("XML parser error: " . 
xml_error_string(xml_get_error_code($xml_parser))); 
}   
// all done, clean up! 
xml_parser_free($xml_parser); 
?> 
</body> 
</html> 

In this case, the characterDataHandler() function works in much the same manner as PHP's built-in strip_tags() functionit scans through the XML and prints only the character data encountered. Because I haven't registered any element handlers, any tags found during this process are ignored.

You'll notice also that this example differs from the ones you've seen thus far, in that the XML data doesn't come from an external file, but has been defined via a variable in the script itself using "here document" syntax.

Here, Boy!

"Here-document" syntax provides a convenient way to create PHP strings that span multiple lines, or strings that retain their internal formatting (including tabs and line breaks).

Consider the following example:

<?php 
$str = <<<MARKER 
This is 
a multi
line 
string 
MARKER; 
?> 

The <<< symbol indicates to PHP that what comes next is a multiline block, and should be stored "as is," right up to the specified marker. This marker must begin with an alphabetic or underscore character, can contain only alphanumeric and underscore characters, and when indicating the end of the block, must be flush with the left-hand margin of your code.

It should be noted that the character data handler is also invoked on CDATA blocks; Listing 2.7 is a variant of Listing 2.6 that demonstrates this.

Listing 2.7 Parsing CDATA Blocks

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body> 
<?php 
// cdata handler 
function characterDataHandler($parser, $data) 
{
echo $data; 
} 
// XML data 
$xml_string = <<<EOF 
<?xml version="1.0"?> 
<message> 
<from>Agent 5292</from> 
<to>Covert-Ops HQ</to> 
<encoded_message> 
<![CDATA[
563247 !#9292 73%639 1^2736 
@@6473 634292 930049 292$88 *7623&& 62367& 
]]> 
</encoded_message> 
</message> 
EOF; 
// initialize parser 
$xml_parser = xml_parser_create(); 
// set cdata handler 
xml_set_character_data_handler
($xml_parser, "characterDataHandler"); 
if (!xml_parse($xml_parser, $xml_string)) 
{
die("XML parser error: " . 
xml_error_string(xml_get_error_code($xml_parser))); 
} 
// all done, clean up! 
xml_parser_free($xml_parser); 
?> 
</body> 
</html> 

When Less Work Is More

There's an important caveat you should note when dealing with character data via PHP's SAX parser. If a character data section contains entity references, then PHP will not replace the entity reference with its actual value first and then call the handler. Rather, it will split the character data into segments around the reference and operate on each segment separately.

What does this mean? Well, here's the sequence of events:

PHP first calls the handler for the CDATA segment before the entity reference.

It then replaces the reference with its value, and calls the handler again.

Finally, it calls the handler a third time for the segment following the entity reference.

Table 2.1 might help to make this clearer. The first column uses a basic XML document without entities; the second column uses a document containing an entity reference within the data block. Both examples use the same character data handler; however, as the output shows, the first example calls the handler once, whereas the second calls the handler thrice.

Table 2.1. A Comparison of Parser Behavior in CDATA Sections Containing Entity References

XML Document without Entity References

XML Document with Entity References

<?xml version="1.0"?> <message>Welcome to GenericCorp. We're just like everyone else. </message>

<?xml version="1.0"?> <!DOCTYPE message [ <!ENTITY company "GenericCorp"> ]> <message>Welcome to &company;. We're just like everyone else.</message>

The Handler:

<?php // cdata handler function characterDataHandler($parser, $data) { echo "| handler in | " . $data . " | handler out | "; } ?>

<?php // cdata handler function characterDataHandler($parser, $data) { echo "| handler in | " . $data . " | handler out | "; } ?>

The output:

| handler in | Welcome to GenericCorp. We're just like everyone else. | handler out |

| handler in | Welcome to | handler out | |handler in | GenericCorp | handler out | |handler in | . We're just like everyone else. | handler out |

Handling Processing Instructions

You can set up a handler for PIs with xml_set_processing_instruction_handler(), which operates just like the character data handler above.

This snippet designates the function PIHandler() as the handler for all PIs found in the document:

xml_set_processing_instruction_handler($xml_parser, "PIHandler"); 

The designated handler must accept three arguments:

A handle representing the XML parser (you can see that this is standard for all event handlers)

The PI target (an identifier for the application that is to process the instruction)

The instruction itself

Listing 2.8 demonstrates how it works in practice. When the parser encounters the PHP code within the document, it calls the PI handler, which executes the code as a PHP statement and displays the result.

Listing 2.8 Executing PIs within an XML Document

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body> 
<?php 
// cdata handler 
function characterDataHandler($parser, $data) 
{
echo $data . "<p>"; 
} 
// PI handler 
function PIHandler($parser, $target, $data) 
{
// if php code, execute it 
if (strtolower($target) == "php") 
{
eval($data); 
} 
// otherwise just print it 
else 
{
echo "PI found: [$target] $data"; 
} 
} 
// XML data 
$xml_data = <<<EOF 
<?xml version="1.0"?> 
<article> 
<header>insert slug here</header> 
<body>insert body here</body> 
<footer><?php print 
"Copyright UNoHoo Inc," . date("Y", mktime()); ?></footer>   
</article> 
EOF; 
// initialize parser 
$xml_parser = xml_parser_create(); 
// set cdata handler 
xml_set_character_data_handler
($xml_parser, "characterDataHandler"); 
// set PI handler 
xml_set_processing_instruction_handler($xml_parser, "PIHandler"); 
if (!xml_parse($xml_parser, $xml_data)) 
{
die("XML parser error: " . 
xml_error_string(xml_get_error_code($xml_parser))); 
} 
// all done, clean up! 
xml_parser_free($xml_parser); 
?> 
</body> 
</html> 

Listing 2.8 designates the function PIHandler() as the handler to be called for all PIs encountered within the document. As explained previously, this function is passed the PI target and instruction as function arguments.

When a PI is located within the document, PIHandler() first checks the PI target ($target) to see if is a PHP instruction. If it is, eval() is called to evaluate and execute the PHP code ($data) within the PI. If the target is any other application, PHP obviously cannot execute the instructions, and therefore resorts to merely displaying the PI to the user.

Careful eval() -uation

You may not know this (I didn't), but PHPwhich is usually pretty rigid about ending every statement with a semicolonallows you to omit the semicolon from the statement immediately preceding a closing PHP tag. For example, this is perfectly valid PHP code:

<?php print "Copyright UNoHoo Inc," . date("Y", mktime()) ?> 

However, if you were to place this code in a PI, and pass it to eval(), as in Listing 2.8, eval() would generate an error. This is because the eval() function requires that all PHP statement(s) passed to it for evaluation must end with semicolons.

Handling External Entities

You already know that an entity provides a simple way to reuse frequently repeated text segments within an XML document. Most often, entities are defined and referenced within the same document. However, sometimes a need arises to separate entities that are common across multiple documents into a single external file. These entities, which are defined in one file and referenced in others, are known as external entities.

If a document contains references to external entities, PHP offers xml_set_external_entity_ref_handler(), which specifies how these entities are to be handled.

This snippet designates the function externalEntityHandler() as the handler for all external entities found in the document:

xml_set_external_entity_ref_handler
($xml_parser, "externalEntityHandler"); 

The handler designated by xml_set_external_entity_ref_handler() must be set up to accept the following five arguments:

A handle representing the XML parser

The entity name

The base URI for the SYSTEM identifier (PHP currently sets this to an empty string)

The SYSTEM identifier itself (if available)

The PUBLIC identifier (if available)

In order to illustrate this, consider the following XML document (see Listing 2.9), which contains an external entity reference (see Listing 2.10).

Listing 2.9 XML Document Referencing an External Entity (mission.xml)

<?xml version="1.0"?> 
<!DOCTYPE mission 
[
<!ENTITY warning SYSTEM "warning.txt"> 
]> 
<mission> 
<objective>Find the nearest Starbucks</objective> 
<goal>Bring back two lattes,
 one espresso and one black coffee</goal> 
<priority>Critical</priority> 
<w>&warning;</w> 
</mission> 

True to You

The handler for external entities must explicitly return true if its actions are successful. If the handler returns false (or returns nothing at all, which works out to the same thing), the parser exits with error code 21 (see the "Handling Errors" section for more information on error codes).

Listing 2.10 Referenced External Entity (warning.txt)

This document will self-destruct in thirty seconds. 

Listing 2.11 is a sample script that demonstrates how the entity resolver works.

Listing 2.11 Resolving External Entities

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body> 
<?php 
// external entity handler 
function externalEntityHandler
($parser, $name, $base, $systemId, $publicId) 
{
// read referenced file 
if (!readfile($systemId)) 
{
die("File I/O error: $systemId"); 
} 
else 
{
return true; 
} 
} 
// cdata handler 
function characterDataHandler($parser, $data) 
{
echo $data . "<p>"; 
} 
// XML data file 
$xml_file = "mission.xml"; 
// initialize parser 
$xml_parser = xml_parser_create(); 
// set cdata handler 
xml_set_character_data_handler
($xml_parser, "characterDataHandler"); 
// set external entity handler 
xml_set_external_entity_ref_handler
($xml_parser, "externalEntityHandler"); 
// read XML file 
if (!($fp = fopen($xml_file, "r"))) 
{
die("File I/O error: $xml_file"); 
} 
// parse XML 
while ($data = fread($fp, 4096)) 
{
// error handler 
if (!xml_parse($xml_parser, $data, feof($fp))) 
{
die("XML parser error: " . 
xml_error_string(xml_get_error_code($xml_parser))); 
} 
} 
// all done, clean up! 
xml_parser_free($xml_parser); 
?> 
</body> 
</html> 

When this script runs, the external entity handler finds and resolves the entity reference, and includes it in the main document. In this case, the external entity is merely included, not parsed or processed in any way; however, if you want to see an example in which the external entity is itself an XML document that needs to be parsed further, take a look at Listing 2.23 in the "A Composite Example" section.

Handling Notations and Unparsed Entities

You already know that notations and unparsed entities go togetherand PHP allows you to handle them, too, via its xml_set_notation_decl_handler() and xml_set_unparsed_entity_decl_handler() functions. (If you don't know what notations and unparsed entities are, drop by Chapter 1, "XML and PHP Basics," and find out what you missed.) Like all the other handlers discussed thus far, both these functions designate handlers to be called when the parser encounters either a notation declaration or an unparsed entity.

The following snippet designates the functions unparsedEntityHandler() and notationHandler() as the handlers for unparsed entities and notations found in the document:

xml_set_unparsed_entity_decl_
handler($xml_parser, "unparsedEntityHandler"); 
xml_set_notation_decl_handler($xml_parser, "notationHandler"); 

The handler designated by xml_set_notation_decl_handler() must be capable of accepting the following five arguments:

A handle representing the XML parser

The notation name

A base URI for the SYSTEM identifier

The SYSTEM identifier itself (if available)

The PUBLIC identifier (if available)

Similarly, the handler designated by xml_set_unparsed_entity_decl_handler() must be capable of accepting the following six arguments:

A handle representing the XML parser

The name of the unparsed entity

A base for the SYSTEM identifier

The SYSTEM identifier itself (if available)

The PUBLIC identifier (if available)

The notation name

In order to understand how these handlers work in practice, consider Listing 2.12, which sets up two unparsed entities representing directories on the system and a notation that tells the system what to do with them (run a script that calculates the disk space they're using, and mail the results to the administrator).

Listing 2.12 XML Document Containing Unparsed Entities and Notations (list.xml)

<?xml version="1.0"?> 
<!DOCTYPE list 
[
<!ELEMENT list (#PCDATA | dir)*> 
<!ELEMENT dir EMPTY> 
<!ATTLIST dir name ENTITY #REQUIRED> 
<!NOTATION directory SYSTEM "/usr/local/bin/usage.pl"> 
<!ENTITY config SYSTEM "/etc" NDATA directory> 
<!ENTITY temp SYSTEM "/tmp" NDATA directory> 
]> 
<list> 
<dir name="config" /> 
<dir name="temp" /> 
</list> 

Listing 2.13 is the PHP script that parses the XML document.

Listing 2.13 Handling Unparsed Entities

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body> 
<?php 
// cdata handler 
function characterDataHandler($parser, $data) 
{
echo $data . "<p>"; 
} 
// unparsed entity handler 
function unparsedEntityHandler
($parser, $entity, $base, $systemId, $publicId, 
$notation) 
{
global $notationsArray; 
if ($systemId) 
{
exec("$notationsArray[$notation] $systemId"); 
} 
} 
// notation handler 
function notationHandler
($parser, $notation, $base, $systemId, $publicId) 
{
global $notationsArray; 
if ($systemId) 
{
$notationsArray[$notation] = $systemId; 
} 
} 
// XML data file 
$xml_file = "list.xml"; 
// initialize array to hold notation declarations 
$notationsArray = array(); 
// initialize parser 
$xml_parser = xml_parser_create(); 
// set cdata handler 
xml_set_character_data_handler
($xml_parser, "characterDataHandler"); 
// set entity and notation handlers 
xml_set_unparsed_entity_decl_
handler($xml_parser, "unparsedEntityHandler"); 
xml_set_notation_decl_handler($xml_parser, "notationHandler"); 
// read XML file 
if (!($fp = fopen($xml_file, "r"))) 
{
die("File I/O error: $xml_file"); 
}   
// parse XML 
while ($data = fread($fp, 4096)) 
{
// error handler 
if (!xml_parse($xml_parser, $data, feof($fp))) 
{
die("XML parser error: " . 
xml_error_string(xml_get_error_code($xml_parser))); 
} 
} 
// all done, clean up! 
xml_parser_free($xml_parser); 
?> 
</body> 
</html> 

This is a little different from the scripts you've seen so far, so an explanation is in order.

The notationHandler() function, called whenever the parser encounters a notation declaration, simply adds the notation and its associated system identifier to a global associative array, $notationsArray. Now, whenever an unparsed entity is encountered, the unparsedEntityHandler() function matches the notation name within the entity declaration to the keys of the associative array, and launches the appropriate script with the entity as parameter.

Obviously, how you use these two handlers depends a great deal on how your notation declarations and unparsed entities are set up. In this case, I use the notation to specify the location of the application and the entity handler to launch the application whenever required.You also can use these handlers to display binary data within the page itself (assuming that your target environment is a browser), to process it further, or to ignore it altogether.

Rapid "exec() -ution"

The PHP exec() function provides a handy way to execute any command on the system. That's why it's so perfect for a situation like the one shown in Listing 2.13. With the usage.pl script and directory name both available to the parser, it's a simple matter to put them together and then have exec() automatically run the disk usage checker every time a directory name is encountered within the XML document.

The convenience of exec() comes at a price, however. Using exec() can pose significant security risks, and can even cause your system to slow down or crash if the program you are "exec() -uting" fails to exit properly. The PHP manual documents this in greater detail.

If you prefer to have the output from the command displayed (or processed further), you should consider the passthru() function, designed for just that purpose.

Handling Everything Else

Finally, PHP also offers the xml_set_default_handler() function for all those situations not covered by the preceding handlers. In the event that no other handlers are defined for the document, all events generated will be trapped and resolved by this handler.

This snippet designates the function defaultHandler() as the default handler for the document:

xml_set_default_handler($xml_parser, "defaultHandler"); 

The function designated by xml_set_default_handler() must be set up to accept the following two arguments:

A handle representing the XML parser

The data encountered

In Listing 2.14, every event generated by the parser is passed to the default handler (because no other handlers are defined), which simply prints the data received. The final output? An exact mirror of the input!

Listing 2.14 Demonstrating the Default Handler

<html> 
<head> 
<basefont face="Arial"> 
</head> 
<body> 
<?php 
// default handler 
function defaultHandler($parser, $data) 
{
echo "<pre>" . htmlspecialchars($data) . "</pre>"; 
} 
// XML data 
$xml_data = <<<EOF 
<?xml version="1.0"?> 
<element>carbon <!-- did you 
know that diamond is a form of carbon? -Ed --> 
</element> 
EOF; 
// initialize parser 
$xml_parser = xml_parser_create(); 
// set default handler 
xml_set_default_handler($xml_parser, "defaultHandler"); 
if (!xml_parse($xml_parser, $xml_data))   
{
die("XML parser error: " . 
xml_error_string(xml_get_error_code($xml_parser))); 
} 
// all done, clean up! 
xml_parser_free($xml_parser); 
?> 
</body> 
</html> 

/ 84