Controlling Parser Behavior
Currently, PHP's XML parser allows you to control the following:
All these attributes can be controlled via the xml_set_option() function, which accepts three parameters:
A handle for the parser to be modified
The attribute name
The attribute value (either string or Boolean)
The sections that follow describe each of these parameters in greater detail with examples.
Within the context of an XML document, case folding simply involves replacing lowercase characters in element names with their uppercase equivalents. XML element names are case-sensitive; typically, you use case folding to impose consistency on mixed-case element names so that they can be handled in a predictable manner.
This option is controlled via the XML_OPTION_CASE_FOLDING attribute and is set to true by default.
In order to see how this works, take a look at Listing 2.3 to turn off case folding (element names will no longer be uppercase).
Listing 2.15 Demonstration of Case Folding
// initialize parser
$xml_parser = xml_parser_create();
// turn off case folding
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, FALSE);
// set callback functions
($xml_parser, "startElementHandler", "endElementHandler");
Here's the output:
Found opening tag of element: sentence
Found CDATA: The
Found opening tag of element: animal
Found attribute: color = blue
Found CDATA: fox
Found closing tag of element: animal
Found CDATA: leaped over the
Found opening tag of element: vegetable
Found attribute: color = green
Found CDATA: cabbage
Found closing tag of element: vegetable
Found CDATA: patch and vanished into the darkness.
Found closing tag of element: sentence
You already know that it's possible to specify a character set for document encoding when an XML parser is created with the xml_parser_create() function. (Refer to the "Speaking Different Tongues" sidebar at the beginning of this chapter.) In geek lingo, this is referred to as source encoding.
In addition, PHP also allows you to specify target encoding, which is the encoding to use when the parser passes data to a handler function.
By default, this encoding is the same as the source encoding; however, you can alter it via the XML_OPTION_TARGET_ENCODING attributes, which supports any one of the following encodings: ISO-8859-1, US-ASCII, and UTF-8.
The following example sets the target encoding for the parser to UTF-8:
xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, "UTF-8");
You can tell the parser to skip the whitespace it encounters by setting the XML_OPTION_SKIP_WHITE attribute to true. This attribute can come in handy if your XML document contains tabs or spaces that could interfere with your program logic.
The following example turns whitespace processing off:
xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, 1);
You can obtain the current value of any of the parser's attributes with the xml_parser_get_option() function, which returns the value of the specified attribute. For example: