XML Processing Tier Decisions
A distributed or Internet application will be successful only if it can scale to meet its peak demands. In the case of XML-enabled applications and services, this cannot be guaranteed by simply throwing more and more hardware into the deployment. Instead, you need to intelligently design the system to perform XML processing on appropriate tiers and you need to choose the right type of processing.First you need to realize that XML moving between tiers can require up to ten times the bandwidth of binary data. Interoperability comes at a cost because metadata is included along with the data. In many data-interchange applications, the metadata, element, and attribute names may be an order of magnitude greater than the data being conveyed. This is a current issue, as evidenced by the fact that two major analyst groups, Forester Research and Burton Group, both have warned enterprises to consider how XML traffic will likely affect their networks. Even Cisco reports that interest is growing in higher-throughput network products due to increasing XML traffic. What does this mean for your distributed applications?This network bandwidth issue means that you need to consider doing XML processing close to the data if remote processing would mean shipping large quantities between tiers. For example, many applications that generate reports from large datasets produce better throughput by performing XSLT processing in the database where the application needs to ship only the finished product across the wire. This is also true if you want to extract only a small subset of data from large XML documents stored in the database.
On the other hand, in situations where XML messages are small and the number of transactions per second is high, you don’t want your database to do anything but inserts. This is where midtier processing has the advantage over database tier processing, especially if validation of the XML is required. For example, the XDK’s Java XML schema processor performs validation using SAX streaming and performs it significantly faster than the database. If you are storing in XMLTypes, XML that you know is already well-formed or valid, you should turn off the well-formedness checking on insert because that can significantly reduce throughput.One area of consideration that is often missed is to make sure the correct parsing techniques are being used. Due to its higher standards’ visibility, DOM parsing is the first choice for most XML document access. In fact, unless the document is being modified and written back, DOM is a bad choice compared to SAX when it comes to throughput and scalability. The same holds true for XSLT processing. Since it requires a DOM to be built, large-document processing may be very memory-intensive or may even fail. The Oracle XML DB’s lazy DOM is used by its internal XSLT processor, eliminating the high memory costs.
Because XML has become more prevalent, its support is being built into many unexpected areas. One such area is web cache. As more and more demands are placed on having web sites and portal sites that have up-to-the-minute or -second data, there is no opportunity to create static pages. XML in combination with XSLT provides this dynamic publishing functionality but can be expensive if every request needs to start the process from scratch. Oracle Web Cache incorporates the capability to perform XSLT transformations on cached XML data, thereby significantly improving performance. Because it is written in C and runs entirely in memory, Oracle Web Cache’s performance improvement can be much greater than that of Java-based approaches in the middle tier.One often-overlooked processing tier is the client. Enhancements to Internet browsers to support XML and XSLT can move the processing off of servers while not dramatically affecting performance. Applets using Oracle XDK’s XML JavaBeans can provide rich XML functionality using a JRE plug-in to your Internet browser. Validating XML input on the client usually degrades the user experience much less than having a server or the database processing hundreds of documents a minute.The Oracle 10g XML platform is designed to be thread-safe down to the component level. Since there are some XML processes such as DOM parsing, XML Schema validation, and XSLT transformation that can take hundreds of milliseconds or even seconds, designing your application to use multithreading may be critical. Both the DOM and SAX parsers can work asynchronously, freeing your application to perform other tasks.Finally, data security has to be a consideration for any XML tier-to-tier serialization. XML is, after all, a text-based protocol and therefore vulnerable. New standards such as Digital Signatures and XML Encryption are just being rolled out but have a processing/performance cost. The best security strategy is to not transmit data unless it really needs to be transmitted, and thus perform the processing close to or on the tier on which the data resides. Fortunately, Oracle provides several strategies, including the Oracle JVM, Oracle XDK XMLType support, and external tables, to minimize data exposure as clear, serialized XML.If you create, whenever possible, functional software blocks or even applications that can run on any tier, you improve the likelihood of a successful deployment as you are able to partition the processing to meet load demands.