Apache Jakarta and Beyond: A Java Programmeramp;#039;s Introduction [Electronic resources] نسخه متنی

12.2. Jakarta Digester

It is appropriate to handle some types of configuration on the command line. However, more complex configuration, or configuration that is likely to remain static over multiple invocations of a program, needs a more permanent mechanism. A natural way to store such configuration is in files, and a powerful format for storing configurations is XML. XML allows configuration well beyond the simple name/value or name/sets of values pairs of a command line and can express special cases such as ordered arguments, nested arguments, and so on.

This is one of the motivations behind Jakarta Digester, although Digester is much more than just a configuration tool. Digester provides the ability to map XML expressions to Java objects in much the same way that OJB provides the ability to map from a database to Java objects. This can be used in any situation where it helps to have a representation of an object tree that can be edited, transmitted using standard protocols like HTTP or SMTP, and so on. Configuration is certainly one such situation.

Digester is built around three concepts: an

object stack, element matching patterns, and

processing rules.

The object stack is what it sounds like: a stack of objects that have been built or are in the process of being built. There are API calls available to examine and manipulate this stack. In simple usage these APIs are not necessary, and the stack can be treated as an internal data structure.

Chapter 9). An expression like "a/b" will match any nodes called "b" whose parent is a node called "a." The pattern "*/c" will match nodes called "c" regardless of their parent or how deep in the hierarchy they are.

Processing rules specify what to do when a node matching a pattern is encountered. There is a general mechanism for this that will be explored later, but a number of built-in rules are provided. Here are some:

ObjectCreateRule creates an object of a type associated with the node name, and places it on the stack.
SetPropertiesRule matches attribute names with properties in the object on top of the stack and calls corresponding set methods.
SetNextRule passes the object on top of the stack to a specified method in the next-to-the-top object on the stack.

As a simple example, consider again a CD application, this time limited to artists and albums. An XML representation of a portion of a collection might look like this:


<?xml version='1.0'?>
<!-- Internal definition of DTD -->
<!DOCTYPE collection [
<!ELEMENT collection (artist*)>
<!ELEMENT artist (album*)>
<!ATTLIST artist name CDATA #REQUIRED>
<!ELEMENT album EMPTY>
<!ATTLIST album name CDATA #REQUIRED year CDATA
<!-- Content -->
<collection>
<artist name="Claire Voyant">
<album name="Time and the Maiden" year="2001"/>
<album name="Love Is Blind" year="2002"/>
</artist>
<artist name="Siddal">
<album name="The Crossing" year="1996"/>
<album name="Mystery and the Sea" year="1997"/>
</artist>
</collection>

On the Java side, simple beans could be used to hold this data, and they are shown in Listings 12.2, 12.3, and 12.4.

Listing 12.2. The collection bean


package com.awl.toolbook.chapter12;
import java.util.ArrayList;
public class CDCollection {
private ArrayList artists = new ArrayList();
public void addArtist(Artist a) {
artists.add(a);
}
}

Listing 12.3. The artist bean


package com.awl.toolbook.chapter12;
import java.util.ArrayList;
public class Artist {
private String name;
public String getName() {return name;}
public void setName(String name) {this.name = name;}
private ArrayList albums = new ArrayList();
public void addAlbum(Album a) {
albums.add(a);
}
}

Listing 12.4. the album bean


package com.awl.toolbook.chapter12;
public class Album {
private String name;
public String getName() {return name;}
public void setName(String name) {this.name = name;}
private int yearReleased;
public int getYearReleased() {return yearReleased;}
public void setYearReleased(int yearReleased) {
this.yearReleased = yearReleased;
}
}

The digester code that will map the XML to the beans is somewhat verbose, although underneath the verbosity it is quite simple. The first step is to obtain a Digester and set any desired features.


Digester digester = new Digester();
digester.setValidating(true);

This tells the Digester to validate the XML against the DTD. This is not strictly necessary, but it is strongly encouraged because hard-to-find errors may result in the absence of validation.^[2] The reason will become clear shortly.

^[2] The implication of this is that a DTD must be provided. In the example it is included within the document itself, which is convenient but means that the DTD must be repeated in each collection. An alternative would be to place the DTD in a separate file and reference it from the XML with <!DOCTYPE artist System "Cd.dtd">. In that case, some care must be take to ensure that Digester can locate the DTD.

Next it is necessary to identify the XML patterns that will trigger actions. Since the XML is simple, so are the patterns. The collection node should clearly cause the creation of a CDCollection object, the code to do this is


digester.addObjectCreate(
"/collection",
"com.awl.toolbook.chapter12.CDCollection");

Errors to Watch For

Digester objects maintain a lot of internal state, and consequently it is recommended that each instance be used to parse only one XML document and never reused. Instances of rules may be reused safely.

Likewise an artist node should trigger the creation of an Artist bean:


digester.addObjectCreate(
"/collection/artist",
"com.awl.toolbook.chapter12.Artist");

In this case it is not enough to create the object, the attributes must also be set. This is specified with


digester.addSetProperties("/collection/artist");

Note that this means the digester will have two rules for the "/collection/artist" pattern. This illustrates an important point: that an arbitrary number of rules can be associated with any pattern, and they will be evaluated in order. In this case it means that the object will be created and then the properties will be set, which makes sense. This handling of multiple rules holds whenever multiple patterns would match the same node, even if the patterns are not identical. That is, the call to AddObjectCreate() could use the "/collection/artist" pattern, and addSetProperties() could use "*/artist," and everything would still work as expected.

After the object has been created and populated, it must be associated with the parent CDCollection node.


digester.addSetNext(
"/collection/artist",
"addArtist");

This means that the addArtist() method below the current object in the stack will be invoked. This assumes that the object is an instance of CDCollection, but according to the rules that have been established and the structure of the XML, this must be the case. This is why validation against the DTD is so important; without it a well-formed but invalid document could result in some other object being on the stack below the Artist, and attempting to call addArtist() would result in an introspection error.

The rules for handling album nodes is very similar to that for artist nodes:


digester.addObjectCreate(
"/collection/album",
"com.awl.toolbook.chapter12.Album");
digester.addSetProperties("/collection/album");
digester.addSetNext(
"/collection/album",
"addAlbum");

Chapter 10 maps between beans and database tables. In conjunction with Digester this provides the ability to move XML into database tables. All that is needed is a way to turn beans into XML, and it would become possible to move almost effortlessly among beans, XML, and databases. While it is possible to write a general bean-to-XML translator, it is also easy enough to handle this in the beans themselves. For example, the CDCOllection could add the following method:


public String toXML() {
StringBuffer buffy = new StringBuffer();
buffy.append("<collection>");
for(int i=0;i<artists.size();i++) {
buffy.append(((Artist) artists(i)).toXML());
}
buffy.append("</collection>");
return buffy.toString();
}

Next, consider a more general version of the Cat program. A useful extension would be the ability to read from a variety of sources including files, programs, and URLs. The output filter introduced in the previous section could also be enhanced so that arguments to programs could be provided along with the program names. It would also be useful to send the final output to a number of destinations instead of just the standard output or a single file. A sample configuration file for this super-enhanced cat is shown in Listing 12.5.

Listing 12.5. A sample cat configuration


<cat-config>
<inputs>
<file name="file1.txt"/>
<program name="pagegen">
<arg value="-pagetype"/>
<arg value="2"/>
</program>
<file name="file2.txt"/>
</inputs>
<filters>
<class
className="com.awl.toolbook.chapter12.LineNumberFilter">
<arg name="format"   value="000: "/>
</class>
<program name="grep">
<arg value="-v"/>
<arg value="monkey"/>
</program>
</filters>
<outputs>
<console/>
<file name="output.txt"/>
</outputs>
</cat-config>

Some of the advantages to using XMLas a configuration language are immediately obvious from this example. The hierarchical nature of XML makes it clear when an expression is meant as a value for a program or class rather than as an argument to Cat itself. XML also makes it very clear what each element is: a class, a program, an argument to a program, and so on. Storing this configuration in a file also has immediate benefits such as the ability to store a set of such files, each of which performs a specialized task, such as mailing a selection of Web pages to a user.

Another advantage of configuration through XML is that it allows for better object-oriented design in the program. In the input section a file node will result in an object that can read a file, and the result of a program node will result in an object that can get the output of as program. These objects can both implement an interface that will be called Readable. The master program will then have a list of Readables and will not need to worry about where or how each object is obtaining its data. Contrast this to command line-based configuration, where the master program must make decisions on the code to invoke based string values.

Listing 12.6 shows the Readable interface.

Listing 12.6. The Readable interface


package com.awl.toolbook.chapter12;
public interface Readable {
public String[] read();
}

This is simple enough; as might be expected, it provides a method that returns an array of strings. There will also be corresponding interfaces called Filter and Writable that support methods with signatures String[] filter(String[]) and void write(String[]), respectively.

Because the code for reading and writing will be the same regardless of the source or destination, it makes sense to put all this code in a common base class. This code is shown in Listing 12.7 .

Listing 12.7. IO Utilities


package com.awl.toolbook.chapter12;
import java.io.*;
import java.util.ArrayList;
public class IOUtils {
public String[] doRead(InputStream in) {
try {
return doRead(
new BufferedReader(
new InputStreamReader(in)));
}   catch (Exception e) {}
return new String[0];
}
public String[] doRead(Reader in)   {
try {
return doRead(new BufferedReader(in));
}   catch (Exception e) {}
return new String[0];
}
public String[] doRead(BufferedReader in)
throws IOException
{
ArrayList ret = new ArrayList();
String line;
while((line = in.readLine()) != null)  {
ret.add(line);
System.out.println("I like goonlin? "   + line);
}
System.out.println("Tree?");
String ret1[] = new String[ret.size()];
for(int i=0;i<ret.size();i++)  {
ret1[i] = (String) ret.get(i);
}
System.out.println(ret1.length);
return ret1;
}
public void doWrite(OutputStream out, String lines[])
throws IOException
{
doWrite(new PrintWriter(
new OutputStreamWriter(out)),
lines);
}
public void doWrite(Writer out, String lines[])
throws IOException
{
doWrite(new PrintWriter(out),lines);
}
public void doWrite(PrintWriter out, String lines[])
throws IOException
{
for(int i=0;i<lines.length;i++) {
out.println(lines[i]);
}
}
}

There is nothing very special here eitherjust a variety of methods that obtain a PrintWriter or BufferedReader from other IO types and uses them accordingly.

From here writing the particular classes is easy, and the class for handling files is shown in Listing 12.8.

Listing 12.8. The file handler


package com.awl.toolbook.chapter12;
import java.io.FileReader;
import java.io.FileWriter;
public class FileHandler
extends IOUtils
implements Readable, Writeable
{
public FileHandler() {}
private String name;
public String getName() {return name;}
public void setName(String name) {
this.name = name;
}
public String[] read() {
String ret[] = new String[0];
try {
FileReader in = new FileReader(name);
ret = doRead(in);
in.close();
}   catch (Exception e) {
System.err.println("Error reading:" + name);
e.printStackTrace(System.err);
}
return ret;
}
public void write(String lines[]) {
try {
FileWriter out = new FileWriter(name);
doWrite(out,lines);
out.close();
}catch (Exception e) {
System.err.println("Error writing:" + name);
e.printStackTrace(System.err);
}
}
}

Listing 11.5 and so will not be repeated here. The only difference of note is that ProgramHandler has a list of Argument objects and an AddArgument() method that adds an instance to the list. The Argument class is shown in Listing 12.9.

Listing 12.9. The argument class


package com.awl.toolbook.chapter12;
public class Argument         {
private String name;
public String getName() {return name;}
public void setName(String name) {this.name = name;}
private String value;
public String getValue() {return value;}
public void setValue(String value) {
this.value = value;
}
}

Argument has both a name and value because it will be needed for another purpose. However, ProgramHandler only uses the value. In this context Argument is something of a waste, as it is nothing more than a wrapper around a string. However, there is no better solution immediately available. It is tempting to make an AddObjectCreate() rule that creates a String when it sees "*/program/arg," but the problem is that String has no set methods, so there is no way to assign the value though a setProperties rule. This will be addressed shortly by a custom rule.

There are three ways to handle the class node in filters. The first and easiest is to use a variation of the objectCreateRule, which gets the name of the class from an attribute instead of using the class name provided when the rule is added. This would be done with


addObjectCreate("cat-config/filters/program",
"com awl.toolbook.chapter12.StubFilter",
className);

where there is a property that automatically set the className variable from the XML attribute, and StubFilter is a filter that does nothing and is provided only because the API requires the name of a valid class as the second argument. Digester uses the class named in the second argument when the attribute is not present in the XML.

Chapter 8. The code for the proxy class is shown in Listing 12.10.

Listing 12.10. A generalized filter


package com.awl.toolbook.chapter12;
import org.apache.commons.beanutils.PropertyUtils;
public class ClassFilter implements Filter {
Filter theFilter = null;
private String className;
public String getClassName() {return className;}
public void setClassName(String className) {
try {
Class c = Class.forName(className);
theFilter = (Filter) c.newInstance();
} catch (Exception e) {
System.err.println("Unable to instantiate "   + className);
e.printStackTrace(System.err);
}
}
public void addArgument(Argument arg) {
try {
PropertyUtils.setSimpleProperty(
theFilter,
arg.getName(),
arg.getValue());
}   catch (Exception e) {}
}
public String[] filter(String lines[]) {
return theFilter.filter(lines);
}
}

setClassName() will be called by Digester as the result of a setProperties rule.setProperty() will be called with Argument objects as a result of a setNext rule.

Now that all the component classes have been written the latest version of Cat needs to set up Digester and then perform the requested actions. The result is shown in Listing 12.11 .

Listing 12.11. Digester-based cat


package com.awl.toolbook.chapter12;
import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.util.ArrayList;
import org.xml.sax.SAXException;
import org.apache.commons.digester.*;
public class Cat3 {
/* Input patterns */
private final static String inputFile =
"cat-config/inputs/file";
private final static String inputProgram =
"cat-config/inputs/program";
private final static String inputURL =
"cat-config/inputs/url";
/* Filter patterns */
private final static String filterProgram =
"cat-config/filters/program";
private final static String filterClass =
"cat-config/filters/class";
private final static String filterClassArg =
"*/filters/class/arg";
/* Output patterns */
private final static String outputFile =
"cat-config/outputs/file";
private final static String outputConsole =
"cat-config/outputs/console";
/* Special pattern for all programs */
private final static String programArg =
"*/program/arg";
/*** Names of classes ***/
private final static String fileHandler =
"com.awl.toolbook.chapter12.FileHandler";
private final static String programHandler =
"com.awl.toolbook.chapter12.ProgramHandler";
private final static String classHandler =
"com.awl.toolbook.chapter12.ClassFilter";
private final static String URLHandler =
"com.awl.toolbook.chapter12.URLHandler";
private final static String argObject =
"com.awl.toolbook.chapter12.Argument";
/*** Names of methods ***/
private final static String addInput    = "addInput";
private final static String addFilter   = "addFilter";
private final static String addOutput   = "addOutput";
private final static String addArgument =
"addArgument";
public Cat3() {}
public static Cat3 config(String fileName)
throws IOException, SAXException
{
Digester digester = new Digester();
URL url = null;
try {
url = new URL("cat-config.dtd");
}   catch (Exception e) {}
// digester.push(this);
digester.addObjectCreate("cat-config", Cat3.class);
// File input
digester.addObjectCreate(inputFile, fileHandler);
digester.addSetProperties(inputFile);
digester.addSetNext(inputFile,addInput);
// Program input
digester.addObjectCreate(inputProgram,
programHandler);
digester.addSetProperties(inputProgram);
digester.addSetNext(inputProgram,addInput);
// URL input
digester.addObjectCreate(inputURL,URLHandler);
digester.addSetProperties(inputURL);
digester.addSetNext(inputURL,addInput);
// Program filter
digester.addObjectCreate(filterProgram,
programHandler);
digester.addSetProperties(filterProgram);
digester.addSetNext(filterProgram,addFilter);
// Class filter
digester.addObjectCreate(filterClass,classHandler);
digester.addSetProperties(filterClass);
digester.addSetNext(filterClass,addFilter);
// Class filter arguments
digester.addObjectCreate(filterClassArg,argObject);
digester.addSetProperties(filterClassArg);
digester.addSetNext(filterClassArg,addArgument);
// File output
digester.addObjectCreate(outputFile,fileHandler);
digester.addSetProperties(outputFile);
digester.addSetNext(outputFile,addOutput);
// Program arguments
digester.addObjectCreate(programArg,argObject);
digester.addSetProperties(programArg);
digester.addSetNext(programArg,addArgument);
File input = new File(fileName);
Cat3 cat = (Cat3) digester.parse(input);
return cat;
}
private ArrayList inputs = new ArrayList();
public void addInput(Readable r) {
System.out.println(this);
System.out.println("arr: " + r);
inputs.add(r);
}
private ArrayList filters = new ArrayList();
public void addFilter(Filter f) {
filters.add(f);
}
private ArrayList outputs = new ArrayList();
public void addOutput(Writeable w) {
outputs.add(w);
}
public void process() {
String lines[] = new String[0];
for(int i=0;i<inputs.size();i++) {
Readable r = (Readable) inputs.get(i);
lines      = mergeArray(lines,r.read());
}
for(int i=0;i<filters.size();i++) {
Filter f = (Filter) filters.get(i);
lines    = f.filter(lines);
System.out.println("M: " + f.getClass());
System.out.println("M: " + lines.length);
}
for(int i=0;i<outputs.size();i++) {
Writeable w = (Writeable) outputs.get(i);
w.write(lines);
}
}
private String[] mergeArray(String arr1[],
String arr2[])
{
if(arr1 == null) {
return arr2;
}
String ret[] = new String[arr1.length +
arr2.length];
System.arraycopy(arr1,0,ret,0,arr1.length);
System.arraycopy(arr2,0,ret,arr1.length,
arr2.length);
return ret;
}
public static void main(String argv[]) throws Exception {
Cat3 cat = Cat3.config(argv[0]);
System.out.println(cat);
cat.process();
}
}

There are lots of rules to set up here, so the initialization code is fairly verbose, although each piece is simple enough. Most pieces consist of three rules: one to create the object, one to set its properties, and one to add it to its parent. Because programs can be used as inputs, filters, and outputs, the rule to handle program arguments is set globally, using "*/program/arg".

Note how simple the process() method is. Because each class is well encapsulated, and because Digester takes care of all the hard work of constructing objects and building the tree, the job of using the objects is reduced to a few simple loops. This is characteristic of Digester-based applications.

12.2.1. Custom Handlers

Using some clever tricks, like the wrapper in Listing 12.9 and the proxy in Listing 12.10, it is possible to do almost everything one might need with the provided Digester rules. However, there are instances where it may be more convenient or general to use custom rules rather than special classes. There may also be unusual circumstances that demand special rules. Naturally Digester makes it possible to implement such custom rules and use them as easily as the built-in rules.

Creating a custom rule is as simple as extending the Rule class, overriding one or more

lifecycle methods. These methods are as follows:

begin(), which is called when the pattern is first encountered. All the attributes from the matched node are passed as a parameter.
body(), which is called when nested CDATA is encountered.
end(), which is called when the closing tag is encountered. Any nested XML nodes will have already been processed.
finish(), which is called after the closing tag has been parsed. This is provided as a way for rules to clean up any data or resources they may have allocated.

Within these methods code will have access to the digester instance from which the rule is being used and can use this digester to examine and manipulate the stack, as illustrated in Listing 12.12.

Listing 12.12. A rule that sets program arguments


package com.awl.toolbook.chapter12;
import org.apache.commons.beanutils.BeanUtils;
import org.apache.commons.digester.Rule;
import org.xml.sax.Attributes;
public class SetProgramArgRule extends Rule {
public void begin(String namespace,
String name,
Attributes attributes)
throws Exception
{
// Find the value attribute
// (should be the only one)
String valueAttribute = null;
for (int i=0; i<attributes.getLength(); i++) {
String aName = attributes.getLocalName(i);
if ("value".equals(aName)) {
valueAttribute = attributes.getValue(i);
}
}
if(valueAttribute == null) {
return;
}
// Get the object on top of the stack,
// which should be a ProgramHandler
Object top       = digester.peek();
ProgramHandler h = (ProgramHandler) top;
h.addArgument(valueAttribute);
}
}

This is as simple as it looks. The sax API is used to examine the attributes until a value is found. Then the top object on the stack is obtained with peek(), and the argument is set with a call to addArgument().

To use this rule, just replace the existing rules for program arguments with


digester.addRule(programArg,new SetProgramArgRule());

where programArg is a variable containing the appropriate pattern. Note there is only one rule here; in particular there is no objectCreate or setNext rules used because no object is created to hold the program argument.

It is almost as easy to set an arbitrary property in a parent object as it is to set a program argument. Recall that special handling was needed for the class handler because the object created may have arbitrary parameters that need to be set, and the DTD cannot possibly specify them all. To set properties from nested nodes instead of attributes, it is just necessary to obtain the name and value from the nested node, then get the object from the stack, and use BeanUtils to do the setting. This is shown in Listing 12.13.

Listing 12.13. A rule that sets arbitrary properties


package com.awl.toolbook.chapter12;
import org.apache.commons.beanutils.BeanUtils;
import org.apache.commons.beanutils.PropertyUtils;
import org.apache.commons.digester.Rule;
import org.xml.sax.Attributes;
public class SetObjectPropertyRule extends Rule {
public void begin(String namespace,
String name,
Attributes attributes)
throws Exception
{
// Find the name and value attributes
String nameAttribute = null;
String valueAttribute = null;
for (int i=0; i<attributes.getLength(); i++)   {
String aName = attributes.getLocalName(i);
if ("value".equals(aName))  {
valueAttribute = attributes.getValue(i);
}   else if ("name".equals(aName)) {
nameAttribute = attributes.getValue(i);
}
}
if(nameAttribute == null  || valueAttribute == null) {
return;
}
// Get the object on top of the stack,
// which should be a ProgramHandler
Object top = digester.peek();
PropertyUtils.setProperty(top,
nameAttribute,
valueAttribute);
}
}

The configuration to use this new rule is also simple: Just replace the rules with


digester.addRule("/cat-config/filters/class/arg"
new SetObjectPropertyRule());

There is just one issue now remaining: how to construct an instance of the desired class instead of the proxy class that was used previously. This can be accomplished through an alternate version of the ObjectCreate rule that looks for an optional attribute and, if found, will use it as the name of the class to construct instead of the provided class name. In this case the rule would be added with


digester.addObjectCreate(
"/cat-config/filters/class",
"com.awl.toolbook.chapter12.ClassFilter",
"classname");

With these two rules in place, the syntax of the configuration file can remain unchanged.