Perl Cd Bookshelf [Electronic resources] نسخه متنی

Accessing XML with SAX

Simple API for XML (SAX) is a standard interface for event-based XML parsing. This means that notification of certain events and data encountered during the parsing of the XML document can be reported by callback functions to the application program. On notification of these events, the application program then must deal with them. For example, the application program can have data structures using callback event handlers. Finally, the types of information and notifications passed back by these callback functions are in the vein of such things as the start and end of elements and information related to an element’s content, such as CDATA, processing instructions, and subelements.

SAX, initially developed by David Megginson, has become a W3C XML standard. One advantage of using SAX parsing over using the DOM is that an in-memory representation of the parse structure doesn’t have to be built, thus saving memory and resulting in better performance for certain types of operations, such as searching. On the other hand, modifying, updating, and performing other structural operations may be made more efficient by using a DOM parser.

SAX Level 1 and Level 2

The SAX API consists of a set of interfaces and classes. Some of these interfaces are implemented by a SAX parser (such as the Oracle XML Parser for Java). Others need to be implemented/ extended by your application. In addition, with SAX Level 2, the interfaces and methods now have namespace support, along with other functionality such as filters. Consequently, because of the namespace support, some of the interfaces were deprecated and replaced with new ones.

SAX interfaces and classes are classified into five groups:

Interfaces implemented by the parser

Interfaces implemented by the application

Standard SAX classes

Optional Java-specific helper classes in the org.xml.sax.helpers package

Java demonstration classes in the nul package

However, as an application writer, you only need to focus on at most two of the interfaces, as described in Table 2-4.

Table 2-4: Interfaces Implemented by Applications
SAX 1.0 Interface	SAX 2.0 Interface	Description
DocumentHandler	ContentHandler	Receives notifications from parser
ErrorHandler	ErrorHandler	Optional interface for special error handling
DTDHandler	DTDHandler	Optional interface needed to work with notations and unparsed (binary) entities
EntityResolver	EntityResolver	Optional interface needed to do redirection of URIs in documents

In addition to the application interfaces, most SAX parsers, including the Oracle XML Parser for Java, implement helper classes that provide static methods that are useful in integrating SAX parsers. These helper classes are described in Table 2-5.

Table 2-5: Oracle SAX Helper Classes
SAX 1.0 Interface	SAX 2.0 Interface	Description
ParserFactory	XMLReaderFactory	Class to support loading SAX parsers dynamically
AttributeListImpl	AtrributeImpl	Convenience class to make a persistent copy of an AttributeList
LocatorImpl	LocatorImpl	Convenience class to make a persistent snapshot of a Locator's values at a specific point in the parse
N/A	NamespaceSupport	Convenience class to add namespace support
N/A	XMLFilerImpl	Base class to be subclassed when applications need to modify the event stream
HandlerBase	DefaultHandler	Base class with default implementations of all four SAX2 handler classes

The following code sample demonstrates a simple use of the parser and SAX API. The XML file given to the application is parsed and prints some information about the contents of this file. Sample code of various useful interfaces is also provided.

import org.xml.sax.*;
import java.io.*;
import java.net.*;
import oracle.xml.parser.v2.*;
public class SAXSample extends DefaultHandler {
// Store the locator
Locator locator;
static public void main(String[] argv) {
try {
if (argv.length != 1) {
// Must pass in the name of the XML file.
System.err.println("Usage: SAXSample filename");
System.exit(1);
}
// Create a new handler for the parser
SAXSample sample = new SAXSample();
// Get an instance of the parser
Parser parser = new SAXParser();

// Set Handlers in the parser
parser.setDocumentHandler(sample);
parser.setEntityResolver(sample);
parser.setDTDHandler(sample);
parser.setErrorHandler(sample);
// Convert file to URL and parse
try {
parser.parse(fileToURL(new File(argv[0])).toString());
}
}
} 
///////////////////////////////////////////////////////
// Sample implementation of ContentHandler interface.
//////////////////////////////////////////////////////
public void setDocumentLocator (Locator locator) {
System.out.println("SetDocumentLocator:");
this.locator = locator;
}
public void startDocument(){
System.out.println("StartDocument");
}
public void endDocument() throws SAXException {
System.out.println("EndDocument");
}
public void startElement(String namespaceURI, String localName,
String qName, AttributeList atts)
throws SAXException  {
System.out.println("StartElement:"+name);
for (int i=0;i<atts.getLength();i++) {
String aname = atts.getName(i);
String type = atts.getType(i);
String value = atts.getValue(i);
System.out.println(" "+aname+"("+type+")"+"="+value);
}
}
public void endElement
(String namespaceURI, String localName, String qName) 
throws SAXException {
System.out.println("EndElement:"+name);
}
public void characters(char[] cbuf, int start, int len) {
System.out.print("Characters:");
System.out.println(new String(cbuf,start,len));
}
public void ignorableWhitespace(char[] cbuf, int start, int len) {
System.out.println("IgnorableWhiteSpace");
}
public void processingInstruction(String target, String data)
throws SAXException {
System.out.println("ProcessingInstruction:"+target+" "+data);
}
////////////////////////////////////////////////////////
// Sample implementation of the EntityResolver interface.
///////////////////////////////////////////////////////
public InputSource resolveEntity (String publicId, String systemId)
throws SAXException {
System.out.println("ResolveEntity:"+publicId+" "+systemId);
System.out.println("Locator:"+locator.getPublicId()+" "+
locator.getSystemId()+
" "+locator.getLineNumber()+" "
+locator.getColumnNumber());
return null;
}
///////////////////////////////////////////////////////
// Sample implementation of the DTDHandler interface.
//////////////////////////////////////////////////////
public void notationDecl (String name, String publicId,
String systemId) {
System.out.println("NotationDecl:"+name+" "+publicId+" "+systemId);
}
public void unparsedEntityDecl (String name, String publicId,
String systemId, String notationName) {
System.out.println("UnparsedEntityDecl:"+name + " "+publicId+" "+
systemId+" "+notationName);
}
…

Using SAX APIs

Quite often, applications that require only SAX (Level 1 and Level 2) support do not want to be burdened with a parser that always builds a full-blown DOM tree in memory. The Oracle XML SAX parser’s high-performance, event-based, run-time engine addresses this requirement. Using the SAX parser, applications can leverage the full power of the SAX model to parse extremely large documents without incurring prohibitive memory costs.

The following code demonstrates how the SAX APIs can be used to extract useful information from an XML document:

// This example demonstrates a simple use of the SAXParser.
// An XML file is parsed and some information is printed out.
import org.xml.sax.*;
import java.io.*;
import java.net.*;
import oracle.xml.parser.v2.*;
public class SAXHandler extends DefaultHandler {
public static void main(String[] argv) {
try {
// Get an instance of the parser
Parser parser = new SAXParser();
// Create a SAX event handler and register it with the parser
SAXHandler handler = new SAXHandler();
parser.setContentHandler(handler);
// Convert file to InputSource and parse
InputSource xmldoc = new InputSource(new FileInputStream(argv[0]));
parser.parse(xmldoc);
}
catch (Exception e) {
System.out.println(e.toString());
}
}
// Sample implementation of DocumentHandler interface.
public void startElement(String name, Attributes atts)
throws SAXException {
System.out.println("StartElement:"+name);
for (int i=0;i<atts.getLength();i++) {
String aname = atts.getName(i);
String type = atts.getType(i);
String value = atts.getValue(i);
System.out.println(" "+aname+"("+type+")"+"="+value);
}
}
public void characters(char[] cbuf, int start, int len) {
System.out.print("Characters:");
System.out.println(new String(cbuf,start,len));
}
}

To use the Oracle XML parser’s SAX support, you need to use the SAXParser class to parse your XML document. The first thing to do, therefore, is to get an instance of this class:

Parser parser = new SAXParser();

You then need to register your SAX event handler with the parser, so that it knows what methods to invoke when a particular event occurs. Because not all events may be of interest to you, make sure the handler you register extends the org.xml.sax.DefaultHandler class. This class provides some default behavior for handling events (typically these do nothing). You can then override the methods for those events of interest to you. In the preceding example, the assumption is that the only events of interest are a subset of those specified by the org.xml.sax.ContentHandler interface, namely, startElement and characters. Arguably, these are the most important SAX events generated because XML documents typically consist of markup and text. This handler can be registered with SAXParser with a simple API call:

parser.setContentHandler(handler);

The startElement event is triggered every time a new element is encountered within the XML document by SAXParser. When this event occurs, you can print the element name and its attributes:

public void 
startElement(String namespaceURI, String localName, String
qName,Attribute
throws SAXException {
...
}

The characters event is triggered every time unmarked-up text is encountered by SAXParser. This text is often the “value” of an element and can be retrieved by listening for this event:

public void characters(char[] cbuf, int start, int len) {
...
}

Once the handler has been registered, all that remains is to parse an XML document using SAXParser:

parser.parse(xmldoc);

The input XML document could contain a list of book data, such as the following:

<booklist>
<book isbn="0-07-213495-X">
<title>Oracle9i XML Handbook</title>
<author>Chang, Scardina and Kiritzov</author>
<publisher>Osborne</publisher>
<price>49.99</price>
</book>
<book isbn="1230-23498-2349879">
<title>Emperor's New Mind</title>
<author>Roger Penrose</author>
<publisher>Oxford Publishing Company</publisher>
<price>15.99</price>
</book>
</booklist>

The following output would be generated:

StartElement:booklist
StartElement:book
isbn(CDATA)= 0-07-213495-X
StartElement:title
Characters: Oracle9i XML Handbook
StartElement:author
Characters: Chang, Scardina and Kiritzov
StartElement:publisher
Characters: Osborne
StartElement:price
Characters:49.99
StartElement:book
isbn(CDATA)=1230-23498-2349879
StartElement:title
Characters:Emperor's New Mind
StartElement:author
Characters:Roger Penrose
StartElement:publisher
Characters:Oxford Publishing Company
StartElement:price
Characters:15.99

Implementation of SAX Level 2 comes mainly in the form of support of XML namespaces, and querying or setting features or properties in the parser. With namespace support, element and attribute names may now return an optional namespace URI followed by a local name, e.g., <foo:bar xmlns:foo=“http://www.oracle.com/”/>, where http://www.oracle.com/ is the namespace URI and bar is the local name. In addition, the qualified name (or qName), foo:bar, may also be returned. Without namespace support, element and attribute names simply return a local name. The SAX Level 2 interfaces affected by namespace support are XMLReader, Attributes, and ContentHandler. An example of SAX 2 namespace support, followed by code for the startElement and endElement callback methods in the ContentHandler interface, might look like this:

// This example demonstrates how to use SAX Level 2 Namespace
// support, followed by how to use the callback
// methods startElement and endElement.
import java.io.*;
import java.net.URL;
import java.net.MalformedURLException;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import oracle.xml.parser.v2.SAXParser;
public class SAX2Namespace {
static public void main(String[] args) {
String fileName;
//Get the file name
fileName = args[0];
try {
// Create handlers for the parser
// For all the other interfaces use the default provided by
// Handler base
DefaultHandler defHandler = new XMLDefaultHandler();
SAXParser parser = new SAXParser();
parser.setContentHandler(defHandler);
parser.setErrorHandler(defHandler);
parser.setEntityResolver(defHandler);
parser.setDTDHandler(defHandler);
try {
parser.parse(createURL(fileName));
}
}
}
static URL createURL(String fileName) {
URL url = null;
try {
url = new URL(fileName);
} catch (MalformedURLException ex) {
try {
File f = new File(fileName);
url = f.toURL();
}
catch (MalformedURLException e) {
System.out.println("Cannot create url for: " + fileName);
System.exit(0);
}
}
return url;
}
}
class XMLDefaultHandler extends DefaultHandler {
public void XMLDefaultHandler() {
}
public void startElement(String uri, String localName,
String qName, Attributes atts)
throws SAXException {
System.out.println("ELEMENT Qualified Name:" + qName);
System.out.println("ELEMENT Local Name :" + localName);
System.out.println("ELEMENT Namespace :" + uri);
for (int i=0; i<atts.getLength(); i++) {
qName = atts.getQName(i);
localName = atts.getLocalName(i);
uri = atts.getURI(i);
System.out.println(" ATTRIBUTE Qualified Name :" + qName);
System.out.println(" ATTRIBUTE Local Name :" + localName);
System.out.println(" ATTRIBUTE Namespace :" + uri);
// You can get the type and value of the attributes either
// by index or by the Qualified Name.
String type = atts.getType(qName);
String value = atts.getValue(qName);
System.out.println(" ATTRIBUTE Type :" + type);
System.out.println(" ATTRIBUTE Value :" + value);
System.out.println();
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("ELEMENT Qualified Name:" + qName);
System.out.println("ELEMENT Local Name :" + localName);
System.out.println("ELEMENT Namespace :" + uri);
}
}

For SAX Level 2, the additional parameters being passed in are the namespace URI, the local name, and the qName. Other SAX Level 2 enhancements include the querying and setting of features and properties in the parser. For example, getter/setter methods such as getFeature, setFeature, getProperty, setProperty, are available supporting namespaces as demonstrated in the following listing:

void process(String filename) throws SAXException, IOException{
URL url = createURL(filename);
// Validating, Namespace = true, NamespacePrefix = true
parser.setFeature
("http://xml.org/sax/features/validation", true);
parser.setFeature
("http://xml.org/sax/features/namespaces", true);
parser.setFeature
("http://xml.org/sax/features/namespace-prefix", true);
try {
parser.parse(url.toString());
}
catch (XMLParseException e) {
System.out.println();
System.out.println(e);
}
// Non-validating, NamespacePrefix = false
parser.setFeature("http://xml.org/sax
/features/validation", true);
parser.setFeature
("http://xml.org/sax/features/namespace-prefix", true);
try {
parser.parse(url.toString());
}
catch (XMLParseException e) {
System.out.println();
System.out.println(e);
}
}

For this code example, note that you can control namespace support in SAX Level 2 processing. In default processing, namespace-prefix is false, meaning that qNames are optionally reported and namespace declarations (xmlns attributes) are not reported. In our example, however, the code stub sets validation, namespaces, and namespace-prefix to be true, which when given

<foo:bar xmlns:foo="http://www.oracle.com/"
 foo1="bar1" foo:stock="wayout.com"/>,

an element will have the namespace URI of http://www.oracle.com/, a local name of bar, and a qName of foo:bar; one attribute will have no namespace URI, no local name, and a qName of xmlns:foo; another attribute will have no namespace URI, a local name, and a qName of foo1; and the last attribute will have the namespace URI http://www.oracle.com/ and a local name of stock.

Oracle SAX APIs in C

To use the Oracle SAX APIs, a set of callback functions is passed to xmlinit(). The parser then invokes these functions as the matching parts of a document are encountered. Compare this to the DOM, in which the document is parsed and a node tree is constructed in memory, which can then be queried and modified through the DOM API. SAX functions are invoked as the document is parsed. Each SAX function returns a (sword) error code. If the code is nonzero, an error is indicated and parsing stops immediately.

The SAX callback structure (xmlsaxcb) is defined as follows:

struct xmlsaxcb {
sword (*startDocument)(void *ctx);
sword (*endDocument)(void *ctx);
sword (*startElement)(void *ctx, const oratext *name,
const struct xmlnodes *attrs);
sword (*endElement)(void *ctx, const oratext *name);
sword (*characters)(void *ctx, const oratext *ch, size_t len);
sword (*ignorableWhitespace)(void *ctx, const oratext *ch,
size_t len);
sword (*processingInstruction)(void *ctx, const oratext *target,
const oratext *data);
sword (*notationDecl)(void *ctx, const oratext *name,
const oratext *publicId,
const oratext *systemId);
sword (*unparsedEntityDecl)(void *ctx, const oratext *name,
const oratext *publidId,
const oratext *systemId,
const oratext *notationName);
sword (*nsStartElement)(void *ctx, const oratext *qname,
const oratext *local,
const oratext *nsp,
const struct xmlnodes *attrs);
}

Any or all callback functions may be specified; none are required. An optional context pointer may be provided, and it will be passed to each callback function. Its use is entirely up to the user. The callback functions are described in detail in Table 2-6.

Table 2-6: SAX Callback Functions
Callback Function	Description
startDocument	Invoked immediately before the parse begins.
endDocument	Invoked immediately after a successful parse ends.
startElement	Invoked when an element start-tag is found. If the namespace version of this callback is also supplied, it is called instead.
endElement	Invoked when an element end-tag is found.
characters	Invoked for each CDATA or #PCDATA.
ignorableWhitespace	Invoked for each run of ignorable white space, unless all white space is being retained (in which case characters is invoked).
processingInstruction	Invoked for each processing instruction.
notationDecl	Invoked for each NOTATION declaration in the DTD.
unparsedEntityDecl	Invoked for each unparsed entity (those with NDATA defined).
nsStartElement	Invoked when a namespace qualified start-tag is found returning the namespace, local part, etc.

The following program fragments show how to declare, register, and use the SAX callbacks:

/* declare SAX callback functions */
sword startdocument(void *ctx);
sword enddocument(void *ctx);
sword startelement(void *ctx, const oratext *name,
const xmlnodes *attrs);
sword endelement(void *ctx, const oratext *name);
sword characters(void *ctx, const oratext *ch, size_t len);
sword whitespace(void *ctx, const oratext *ch, size_t len);
sword pi(void *ctx, const oratext *target,
const oratext *data);
sword notation(void *ctx, const oratext *name,
const oratext *publicId,
const oratext *systemId);
sword entity(void *ctx, const oratext *name,
const oratext *publidId,
const oratext *systemId,
const oratext *notationName);
/* declare SAX callback context */
typedef struct saxcontext {
uword depth; /* nested element level, for indenting */
} sax_context;
/* declare SAX callback structure */
xmlsaxcb sax_callback = {
startdocument, enddocument, startelement, endelement,
characters, whitespace, pi, notation, entity
};
/* declare SAX context and initialize */
sax_context saxctx = { 0 }; /* depth = 0 */
/* initialize parser specifying SAX callbacks */
xmlinit(&ecode, NULL, NULL, NULL, NULL,
&sax_callback, (void *) &saxctx, NULL, NULL);
/* ----- SAX CALLBACKS ----- */
sword startdocument(void *context) {
puts("StartDocument");
return 0; /* success */
}
sword enddocument(void *context) {
puts("EndDocument");
return 0; /* success */
}
sword startelement(void *context, const oratext *name,
const xmlnodes *attrs) {
sax_context *saxctx = (sax_context *) context;
indent(saxctx->depth);
printf("<%s", name);
if (attrs) {
for (i = 0; i < numAttributes(attrs); i++) {
attr = getAttributeIndex(attrs, i);
printf(" %s=\"%s\", getAttrName(attr), getAttrValue(attr));
}
}
puts(">");
saxctx->depth++;
return 0; /* success */
}
sword endelement(void *context, const oratext *name) {
sax_context *saxctx = (sax_context *) context;
indent(--saxctx->depth);
printf("</%s>\n", name);
return 0; /* success */
}

sword characters(void *context, const oratext *ch, size_t len) {
sax_context *saxctx = (sax_context *) context;
indent(saxctx->depth);
putchar('"');
print_string((oratext *) ch, (sword) len);
puts("\");
return 0; /* success */
}
sword whitespace(void *context, const oratext *ch, size_t len)
{
sax_context *saxctx = (sax_context *) context;
indent(saxctx->depth);
putchar('\'');
print_string((oratext *) ch, (sword) len);
puts("'");
return 0; /* success */
}
sword pi(void *context, const oratext *target,
const oratext *data) {
sax_context *saxctx = (sax_context *) context;
indent(saxctx->depth);
fputs("PI", stdout);
if (target)
printf(" target=\"%s\", target);
if (data)
printf (" data=\"%s\", data);
putchar('\n');
return 0; /* success */
}
sword notation(void *context, const oratext *name,
const oratext *publicId,
const oratext *systemId) {
sax_context *saxctx = (sax_context *) context;
indent(saxctx->depth);
printf("NOTATION '%s'", name);
if (publicId)
printf (" PUB:%s", publicId);
if (systemId)
printf(" SYS:%s", systemId);
putchar('\n');
return 0; /* success */
}
sword entity(void *context, const oratext *name,
const oratext *publidId,
const oratext *systemId,
const oratext *notationName) {
sax_context *saxctx = (sax_context *)
 context;
indent(saxctx->depth);
printf("ENTITY '%s'", name);
if (publidId)
printf(" PUB:%s", publidId);
if (systemId)
printf(" SYS:%s", systemId);
if (notationName)
printf(" NAME:%s", notationName);
putchar('\n');
return 0; /* success */
}

The following is a sample XML
document that includes an inline DTD:

<?xml version="1.0"?>
<!DOCTYPE PLAY [
<!ELEMENT top (second*)>
<!ELEMENT second (third*)>
<!ELEMENT third (#PCDATA)*
>
<!NOTATION note1 SYSTEM "foo.exe">
<!NOTATION note2 PUBLIC "bar" "bar.ent">
<!ENTITY ent SYSTEM "http://www.w3.org/" NDATA n>
]>
<?dummy this is a sample processing instruction?>
<top>
<second>
<third>third level</third>
</second>
</top>

This is the resulting output from the preceding sample program:

StartDocument
NOTATION 'note1' SYS:foo.exe
NOTATION 'note2' PUB:bar SYS:bar.ent
ENTITY 'ent' SYS:http://www.w3.org/ NAME:n
PI target=dummy data=this is a sample processing instruction
<top>
'\n '
<second>
'\n '
<third>
"third level"
</third>
'\n '
</second>
'\n'
</top>
EndDocument

Perl Cd Bookshelf [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی