Best Practices
This chapter has demonstrated various ways to parse and programmatically access the content of XML documents. Each method has different strengths and weaknesses, as discussed in its respective section. However, as the following sections discuss, there are some particular features and solutions to common problems.
DTD Caching
When parsing a collection or batch of XML documents with the same DTD, performance is improved significantly if you cache the DTD, which eliminates its being parsed over and over again. While DTD caching is not enabled automatically, the Oracle XML Parser for Java provides the validating/nonvalidating DTD caching through the setDoctype() function. After you set the DTD using this function, the parser will cache this DTD for further XML parsing. This is illustrated in the following code fragment:
// Parse the first document and set the DTD for caching
parser.setValidationMode(DOMParser.DTD_VALIDATION);
parser.setAttribute(DOMParser.USE_DTD_ONLY_FOR_VALIDATION,Boolean.TRUE);
parser.parse("{XML_Document_URL}");
DTD dtd =parser.getDoctype();
parser.setDoctype(dtd);
// loop of XML parsing
for(...) {
// XML Parsing with DTD Cached
}
Note that you also should set the following if the cached DTD object is used only for validation:
parser.setAttribute(DOMParser.USE_DTD_ONLY_FOR_VALIDATION,Boolean.TRUE);
Otherwise, the XML parser copies the DTD object and adds it to the resultant DOM tree. While the preceding example is for an internal DTD, the same method is used for external DTDs.
Skipping the <!DOCTYPE> Tag
A common problem when parsing an XML document that has an external DTD declaration is retrieving that DTD. In many cases, it may not be necessary, and frequently firewalls, permissions, etc., prevent retrieving it. Fortunately, this can be ignored in the Java parser in either of two ways. If you have write access to the document, you can add standalone=“yes” as an attribute to the DOCTYPE element. Alternatively, within the application, you can add
xmlparser.setAttribute(XMLParser.STANDALONE, Boolean.TRUE)
which has the same effect.
Cutting and Pasting Across Documents
Using the DOM parser is the appropriate way to modify an XML document in most cases. However, when the modification needs to be across different documents, as you would do when cutting and pasting, the approach is not obvious. Fortunately, this has been made easy with the DOM 3.0 adoptNode() method. As distinct from importNode(), which simply copies the node from one document to another, adoptNode() actually removes it from one document and inserts it into the other document, as illustrated in the following code fragment:
XMLDocument doc1 = new XMLDocument();
XMLElement element1 = (XMLElement)doc1.createElement("foo");
doc1.appendChild(element1);
XMLDocument doc2 = new XMLDocument();
XMLElement element2 = (XMLElement) doc2.createElement("bar");
doc2.appendChild(element2);
...
// Using adoptNode()
element2 = (XMLElement)doc1.adoptNode(element2);
element1.appendChild(element2);
We will illustrate further parsing examples in the third part of the book when we illustrate actual applications that you can build.