19.1 Parsing with JAXP and SAX
The
first thing you want to do with an XML document is parse it. There
are two commonly used approaches to XML parsing: they go by the
acronyms SAX and DOM. We'll begin with SAX parsing;
DOM parsing is covered later in the chapter.SAX is the
Simple API for XML. SAX is not a parser, but rather a Java API that
describes how a parser operates. When parsing an XML document using
the SAX API, you define a class that implements various
"event" handling methods. As the
parser encounters the various element types of the XML document, it
invokes the corresponding event-handler methods
you've defined. Your methods take whatever actions
are required to accomplish the desired task. In the SAX model, the
parser converts an XML document into a sequence of Java method calls.
The parser doesn't build a parse tree of any kind
(although your methods can do this, if you want). SAX parsing is
typically quite efficient and is therefore often your best choice for
most simple XML processing tasks. SAX-style XML parsing is known as
"push parsing" because the parser
"pushes" events to your event
handler methods. This is in contrast to more traditional
"pull parsing" in which your code
"pulls" tokens from a parser.The SAX API was created by David
Megginson (http://www.megginson.com/ ) and is
now maintained at http://www.saxproject.org. The
Java binding of the SAX API consists of the package
org.xml.sax and its subpackages. SAX is a de facto
standard but has not been standardized by any official body. There
are two versions of the SAX API. Version 2 is substantially different
from the original Version 1, and is today the most common. We cover
Version 2 in this chapter.SAX is an API, not an implementation.
Various XML parsers implement the SAX API, and in order to use SAX
you need an underlying parser implementation. This is where JAXP
comes in. JAXP is the Java API for XML Parsing, and was added to J2SE
in Java 1.4.[1] JAXP consists of the
javax.xml.parsers package, and also
javax.xml.transform, which we'll
consider later in this chapter. JAXP provides a thin layer on top of
SAX (and on top of DOM, which we'll also see later)
and standardizes an API for obtaining and using SAX (and DOM) parser
objects. The JAXP package includes default parser implementations but
allows other parsers to be easily plugged in and configured using
system properties.
[1] Prior to Java 1.4, it was available as a
standard extension.
Example 19-1 is a listing
of ListServlets.java, a program that uses JAXP
and SAX to parse a web application deployment descriptor and list the
names of the servlets configured by that file. We'll
see servlets and their deployment descriptors in Chapter 20, but until then you just need to know that
servlet-based web applications are configured using an XML file named
web.xml. This file contains
<servlet> tags that define mappings between
servlet names and the Java classes that implement them. It also
contains <servlet-mapping> tags that map
from servlet name to a URL or URL pattern by which the servlet is
invoked. The ListServlets program parses a
web.xml file and stores the name-to-class and
name-to-URL mappings, printing out a summary when it reaches the end
of the file. To help you understand the what the example does, here
is an excerpt from the web.xml file developed in
Chapter 20:
<servlet>
<servlet-name>Hello</servlet-name>
<servlet-class>je3.servlet.HelloNet</servlet-class>
</servlet>
<!-- The Counter servlet uses initialization parameters -->
<servlet>
<servlet-name>Counter</servlet-name>
<servlet-class>je3.servlet.Counter</servlet-class>
<init-param>
<param-name>countfile</param-name> <!-- where to save state -->
<param-value>/tmp/counts.ser</param-value>
<!-- adjust for your system-->
</init-param>
<init-param>
<param-name>saveInterval</param-name> <!-- how often to save -->
<param-value>30000</param-value> <!-- every 30 seconds -->
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>Hello</servlet-name>
<url-pattern>/Hello</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>Counter</servlet-name>
<url-pattern>/Counter</url-pattern>
</servlet-mapping>
<!-- Note the wildcard below:
any URL ending in .count invokes Counter -->
<servlet-mapping>
<servlet-name>Counter</servlet-name>
<url-pattern>*.count</url-pattern>
</servlet-mapping>
ListServlets.java
includes a main( ) method that uses the JAXP API
to obtain a SAX parser instance. It then passes the
File to parse, along with an instance of the
ListServlets class, to the parser. The parser
starts running and invokes the ListServlets
instance methods as it encounters XML elements in the file.ListServlets extends the SAX
org.xml.sax.helpers.DefaultHandler class. This
superclass provides dummy implementations of all the SAX
event-handler methods. The example simply overrides the handlers of
interest. The parser calls the startElement( )
method when it reads an XML tag; it calls endElement(
) when it finds a closing tag. characters(
) is invoked when the parser reads a string of plain text
with no markup. Finally, the parser calls warning(
), error( ), or fatalError(
) when something goes wrong in the parsing process. The
implementations of these methods are written specifically to extract
the desired information from a web.xml file and
are based on a knowledge of the structure of this type of file.Note that web.xml
files are somewhat unusual in that they don't rely
on attributes for any of the XML tags. That is, servlet names are
defined by a <servlet-name> tag nested
within a <servlet> tag, instead of simply
using a name attribute of the
<servlet> tag itself. This fact makes the
example program more complex than it would otherwise be. The
web.xml file does allow id
attributes for all its tags. Although servlet engines are not
expected to use these attributes, they may be useful to a
configuration tool that parses and automatically generates
web.xml files. In order to demonstrate how to
work with attributes in SAX, the startElement( )
method in Example 19-1 looks for an
id attribute of the
<servlet> tag. The value of that attribute,
if it exists, is reported in the program's output.To run this program, specify the path to a
web.xml file on the command line. You can use
the one included with the servlets examples, which is at
je3/servlet/WEB-INF/web.xml.
Example 19-1. ListServlets.java
package je3.xml;
import javax.xml.parsers.*;
// JAXP classes for obtaining a SAX Parser
import org.xml.sax.*; // The main SAX package
import org.xml.sax.helpers.*; // SAX helper classes
import java.io.*; // For reading the input file
import java.util.*; // Hashtable, lists, and so on
/**
* Parse a web.xml file using the SAX2 API.
* This class extends DefaultHandler so that instances can serve as SAX2
* event handlers, and can be notified by the parser of parsing events.
* We simply override the methods that receive events we're interested in
**/
public class ListServlets extends org.xml.sax.helpers.DefaultHandler {
/** The main method sets things up for parsing */
public static void main(String[ ] args)
throws IOException, SAXException, ParserConfigurationException
{
// We use a SAXParserFactory to obtain a SAXParser, which
// encapsulates a SAXReader.
SAXParserFactory factory = SAXParserFactory.newInstance( );
factory.setValidating(false); // We don't want validation
factory.setNamespaceAware(false); // No namespaces please
// Create a SAXParser object from the factory
SAXParser parser = factory.newSAXParser( );
// Now parse the file specified on the command line using
// an instance of this class to handle the parser callbacks
parser.parse(new File(args[0]), new ListServlets( ));
}
HashMap nameToClass; // Map from servlet name to servlet class name
HashMap nameToID; // Map from servlet name to id attribute
HashMap nameToPatterns; // Map from servlet name to url patterns
StringBuffer accumulator; // Accumulate text
String servletName, servletClass, servletPattern; // Remember text
String servletID; // Value of id attribute of <servlet> tag
// Called at the beginning of parsing. We use it as an init( ) method
public void startDocument( ) {
accumulator = new StringBuffer( );
nameToClass = new HashMap( );
nameToID = new HashMap( );
nameToPatterns = new HashMap( );
}
// When the parser encounters plain text (not XML elements), it calls
// this method, which accumulates them in a string buffer.
// Note that this method may be called multiple times, even with no
// intervening elements.
public void characters(char[ ] buffer, int start, int length) {
accumulator.append(buffer, start, length);
}
// At the beginning of each new element, erase any accumulated text.
public void startElement(String namespaceURL, String localName,
String qname, Attributes attributes) {
accumulator.setLength(0);
// If it's a servlet tag, look for id attribute
if (qname.equals("servlet")) servletID = attributes.getValue("id");
}
// Take special action when we reach the end of selected elements.
// Although we don't use a validating parser, this method does assume
// that the web.xml file we're parsing is valid.
public void endElement(String namespaceURL,
String localName, String qname)
{
// Since we've indicated that we don't want name-space aware
// parsing, the element name is in qname. If we were doing
// namespaces, then qname would include the name, colon and prefix,
// and localName would be the name without the the prefix or colon.
if (qname.equals("servlet-name")) { // Store servlet name
servletName = accumulator.toString( ).trim( );
}
else if (qname.equals("servlet-class")) { // Store servlet class
servletClass = accumulator.toString( ).trim( );
}
else if (qname.equals("url-pattern")) { // Store servlet pattern
servletPattern = accumulator.toString( ).trim( );
}
else if (qname.equals("servlet")) { // Map name to class
nameToClass.put(servletName, servletClass);
nameToID.put(servletName, servletID);
}
else if (qname.equals("servlet-mapping")) {// Map name to pattern
List patterns = (List)nameToPatterns.get(servletName);
if (patterns == null) {
patterns = new ArrayList( );
nameToPatterns.put(servletName, patterns);
}
patterns.add(servletPattern);
}
}
// Called at the end of parsing. Used here to print our results.
public void endDocument( ) {
// Note the powerful uses of the Collections framework. In two lines
// we get the key objects of a Map as a Set, convert them to a List,
// and sort that List alphabetically.
List servletNames = new ArrayList(nameToClass.keySet( ));
Collections.sort(servletNames);
// Loop through servlet names
for(Iterator iterator = servletNames.iterator( ); iterator.hasNext( );)
{
String name = (String)iterator.next( );
// For each name get class and URL patterns and print them.
String classname = (String)nameToClass.get(name);
String id = (String)nameToID.get(name);
List patterns = (List)nameToPatterns.get(name);
System.out.println("Servlet: " + name);
System.out.println("Class: " + classname);
if (id != null) System.out.println("ID: " + id);
if (patterns != null) {
System.out.println("Patterns:");
for(Iterator i = patterns.iterator( ); i.hasNext( ); ) {
System.out.println("\t" + i.next( ));
}
}
System.out.println( );
}
}
// Issue a warning
public void warning(SAXParseException exception) {
System.err.println("WARNING: line " +
exception.getLineNumber( ) + ": "+
exception.getMessage( ));
}
// Report a parsing error
public void error(SAXParseException exception) {
System.err.println("ERROR: line " +
exception.getLineNumber( ) + ": " +
exception.getMessage( ));
}
// Report a non-recoverable error and exit
public void fatalError(SAXParseException exception)
throws SAXException {
System.err.println("FATAL: line " +
exception.getLineNumber( ) + ": " +
exception.getMessage( ));
throw(exception);
}
}