Chapter 7: Putting It All Together with XML Pipeline, JSPs, and XSQL
XML processing has evolved quite a bit over the last few years. Previously, XML application developers needed to come up with quite a bit of code to parse an XML file, apply a stylesheet to it, and transmit the results. With the acceptance of XML in modern business applications, the demands on the processing infrastructure to access and exchange business data in the form of XML also grew; and associated with that growth came even higher application development and maintenance cost.To reduce this cost, Oracle XML Developer’s Kit 10g (XDK) extends and supports XML standards and introduces new processing technologies and features that simplify XML creation, access, transformation, and validation. With these new mechanisms, XML application developers can more easily process XML within their business-to-business (B2B), business-to-customer (B2C), and Enterprise Application Integration (EAI) applications. This chapter explains how the XML Pipeline Processor, JSPs, and the XSQL Servlet enable Oracle XML application developers to achieve this goal of greatly reducing the complexity of today’s XML processing.
Introducing the XML Pipeline Processor
The XML Pipeline Processor establishes a reusable component framework that supports declarative pipelining of XML resources so that different processes, such as XML parsing, XML schema validation, and XSL transformations, can be performed for an application within this framework. Compliant with the W3C XML Pipeline Definition Language Version 1.0 Note (http://www.w3.org/TR/2002/NOTE-xml-pipeline-20020228/), Oracle’s implementation of this processing framework allows developers to avoid dealing with all of these different process interfaces individually, in a sense “pipelining” the processing of XML in one module.To begin with, an XML document detailing this pipeline must be created according to the rules specified in the W3C Note. For the XML Pipeline Processor to act upon it, the use of the available XML processing components and the inputs and outputs for these processes must be established in this document. For Oracle’s XML Pipeline Processor, these available components include the DOM and SAX XML parsers for parsing the XML documents, the XML Schema Processor for the XML schema validations, the XSL Processor for transforming XML documents, SAXSerializer for printing XML, and the XML Compressor to compress XML into binary format.Put simply, the XML Pipeline Processor executes the chain of XML processing according to the descriptions in the pipeline document and returns a particular result. The following is an example of an XML Pipeline document that performs an XSLT transformation of book.xml using book.xsl and producing booklistl:
<pipeline xmlns=http://www.w3.org/2002/02/xml-pipeline
xml:base="http://example.org/">
<param name="target" select="booklistl"/>
<processdef name="domparser.p"
definition="oracle.xml.pipeline.processes.DOMParserProcess"/>
<processdef name="xslstylesheet.p"
definition="oracle.xml.pipeline.processes.XSLStylesheetProcess"/>
<processdef name="xslprocess.p"
definition="oracle.xml.pipeline.processes.XSLProcess"/>
<process id="p2" type="xslstylesheet.p" ignore-errors="false">
<input name="xsl" label="book.xsl"/>
<outparam name="stylesheet" label="xslstyle"/>
</process>
<process id="p3" type="xslprocess.p" ignore-errors="false">
<param name="stylesheet" label="xslstyle"/>
<input name="document" label="xmldoc"/>
<output name="result" label="booklistl"/>
</process>
<process id="p1" type="domparser.p" ignore-errors="true">
<input name="xmlsource" label="book.xml "/>
<output name="dom" label="xmldoc"/>
<param name="preserveWhitespace" select="true"></param>
<error name="dom">
<html xmlns="http://www/w3/org/1999/xhtml">
<head>
<title>DOMParser Failure!</title>
</head>
<body>
<h2>Error parsing document</h2>
</body>
</html>
</error>
</process>
</pipeline>
Note that any error is returned as an HTML document, which is consistent with the output format.
Multistage XML Processing
Multistage XML processing is quite straightforward, allowing processing of XML components in parallel and at different stages. Thus, the output from processing an XML document can immediately act as one input or multiple inputs to other stages of processing XML. All of this multistage XML processing is encapsulated within an XML Pipeline document.
Parsing, then Validation, then Serialization or Transformation
When users need to process and access XML data, the first step is to parse the XML document by using an XML parser. XML parsers are the components that read in XML documents and provide the programmatic access to the content and structure of XML. Depending on whether the structure of the document is governed by a DTD or an XML schema, the validating parser performs the checking operations necessary during the parsing. In Oracle’s Java XML Schema Processor, a lax validation mode also exists, in addition to the strict validation mode, whereby synchronous retrieval of the metadata information and the validation processing status from the XML Schema Processor during the SAX XML parsing can occur.The management of handlers for SAX events streaming from XML SQL Utility (XSU) output after SQL queries return rowset data is greatly simplified with a new Java interface, oracle.xml.parser .v2.SAXSerializer, which provides output options to specify if the pretty printing format is needed, what the XML declaration and encoding information is, which if any are the elements whose content needs to be set as CDATA sections, and what the DTD system-id and public-id are. To use this new feature, you simply use it as another type of SAX content handler. For example, you can register it to the XSU’s SAX output interface as follows:
OracleXMLQuery.getXMLSAX(sample);
This generates unbounded XML documents from result sets returned from queries, with warnings or errors reported either as processing occurs or at the very end of the processing.Alternatively, XSL Transformation (XSLT) stylesheets can then be applied to either the streaming XML data or the input XML documents to transform and apply formatting semantics on the text output.
SAX vs. DOM
SAX parsing is event-based XML parsing, meaning that when certain events occur or are encountered when processing the XML document—for example, when the root node of the document is encountered—the event handlers or functions, startDocument for this example, are then invoked through function callbacks. Compared to DOM parsing, SAX is much faster and less memory-intensive in that an in-memory tree representation is not constructed. For multistage XML processing, this lightweight XML parsing is ideal when you need to filter and search from large XML documents, but the drawbacks are that the XML content cannot be changed in place and dynamic access to the content is not as efficient as with DOM. In the case of the current Pipeline Processor implementation, the SAX Parser can be connected to the XML Compressor and the SAXSerializer but not to the XSLT Processor, because that requires a DOM.