Designing the XML Database
When you start to design XML applications using Oracle XML DB, you need to make several decisions, including how to store XML data in a database, what is the strategy to retrieve or generate XML, and how to create proper indexes for searching the content in the XML documents. In this section, we will discuss some helpful approaches to these decisions.
How to Store XML Data
As you have learned, there are different ways to store XML documents inside an Oracle XML database, and each of them offers different trade-offs in both performance and functionality. From experience, we find that the design flow depicted in Figure 8-5 is helpful in simplifying the selections.
Figure 8-5: Decision flowchart for storing XML in Oracle XML DB
By walking through this flowchart, you can explore the design strategy for using different storage types. First, you can start by asking whether you need to store XML documents in relational tables or as XMLType objects. This greatly affects the application development process, the technologies you use, and the performance of your application.
XMLType or Relational Tables?
This initial decision can generally be based on the format of the XML data and the DOM fidelity requirement on XML content. For format, the XML documents generally can be categorized as the data-centric or the document-centric document.
Data-centric XML documents are characterized by the regular structure for the data, in which the smallest data unit is either an XML element with simple content or an XML attribute. In such XML document, there is little or no mixing content (i.e., tags within an XML element’s string content). Additionally, the DOM Fidelity for the document is not required to preserve. The following XML document is an example data-centric XML document:
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">…</shipTo>
<billTo country="US">…</billTo>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item> …
</items>
</purchaseOrder>
All the data in the above purchase order document are represented either as XML elements or as the attributes for XML elements.Most of the data-centric XML documents are created based on the relational data for exchanging and sharing data between applications. You can think of XML as the boxes used for moving things around. When relocating them to another place, you need to put items in boxes in order to protect them and organize them for easy delivery. However, when the items reach the destination, there is no need to keep them in the box.Likewise, for data-centric XML documents, there is no need to keep data in XML once it reaches the database. For a data-centric XML document, if XML elements and attributes are properly stored in relational table columns, it has the optimized storage for further processing, which avoids the overhead of keeping and managing the XML structure in the database. Whenever you need the data in XML for data exchange and Web publishing, you can build up XMLType views to wrap up data in XML.On the other hand, there are document-centric XML documents, which are characterized by less regular or irregular data structure with lots of mixing content. For example, the following faq.xml is a document-centric XML document, in which the <ANSWER> element contains the mixing content of both text and <KEY> and <CODE> elements.
<FAQ>
<TITLE>What is wrong if I got "ClassNotFoundException"? </TITLE>
<ANSWER> If you get this kind of error, you need to check if the
<KEY>CLASSPATH</KEY> is correctly set. Basically you need to add the
following <KEY>jar files</KEY> in your Java CLASSPATH environment
variable:
<CODE> xmlparserv2.jar: XML Parser V2 for Java
xschema.jar: XML Schema Processor for Java
xsu12.jar(xsu12.jar): XML SQL Utility for Java
…
</CODE>
</ANSWER>
<CATEGORY>xdk</CATEGORY>
<LANGUAGE>java</LANGUAGE>
</FAQ>
Dealing with the document-centric XML documents, you can store them in the native XMLTypes that well preserves the original XML data structure. If there are many queries on the XML content, the XML Schema–based XMLType is suggested as it well maintains the XML structure in the database and provides high-performance data retrieval and updates using the XML metadata.
However, sometimes no clear line exists between these two categories. A document may be highly structured without mixed content, but still cannot be shredded into tables if it has some XML-specific information—such as comments, processing instructions, and document order— that needs to be preserved. The following document shows a variation of the previous purchase order that now contains processing instructions (PIs) in the XML content. To keep the information, such as namespace prefixes, processing instructions, and even the order of the elements, in the XML document, you can choose to use the native XMLType to preserve the DOM byte for byte.
<?xml version="1.0"?>
<?dml name="Welcome"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">…</shipTo>
<?dml name="Welcome"?>
<billTo country="US">…</billTo>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item> …
</items>
</purchaseOrder>
In general, when selecting between relational and native XML storage, you need to ask yourself whether your application really needs the full XML document preserved. Otherwise, you should leverage the database’s ability to generate XML from relational tables using XMLType views to void the overhead for preserving the XML markups, the whitespaces for XML indentations, and so on.Note that if you store XML in relational tables but construct XML interfaces with XMLType views that are deeply nested, the data queries and operations with many table joins may degrade the performance. In this case, you should choose the native XMLType data type because it well preserves the hierarchical structure of the XML documents and thus optimizes the related XML queries or operations.Finally, the native XML Schema–based XMLType storage binds the underlying database schema tightly to the XML schemas. Currently, there are limitations on how much the schema can evolve before an export and import of the data set would be needed.
XML Schema–Based Types or CLOB XMLTypes?
If you decide to use XMLType, you need to decide whether you want to store XML in CLOBs or use the XML Schema–based XMLTypes by shredding the XML content and storing it in a set of SQL objects.If you store XML in CLOBs, you keep the document in the original format byte for byte and can use the XMLType functions and the SQL commands to update and query the XML data. This type of storage is optimum for DTD-based documents and XML Schema–based documents where the schemas are changing or varied. Since the storage is not based upon the XML structure, documents of any size and hierarchical depth can be accommodated equally. However, since the XML data is not stored in a preparsed structure, it takes time to parse the documents before accessing the XML content. If your application requires intensive data retrievals or updates on XML element or attributes, or you want to extract the data out of XML documents for different use, you should not use the CLOB storage.In the XML Schema–based XMLTypes, the XML documents are shredded and stored as a set of SQL objects. The XML content is validated and stored in a preparsed format. This storage is optimized in database for the fine-grained data queries and retrievals. Because this kind of XMLType is stored using offsets based on its XML Schema, called XML Objects (XOBs), and not materialized into the DOM tree until needed, it saves system resources. In addition, you have the option to preserve the byte-for-byte DOM fidelity, such as the order of XML elements and attributes, the whitespace between elements and attributes, XML comments, XML PIs, namespace declarations, and so on. However, XML Schema–based XMLTypes have high cost for schema evolutions.
Note | Because extra information is needed to preserve the DOM fidelity, you should avoid using it unless necessary. |
After discussing all the storage models for XML documents in the Oracle Database 10g database, Table 8-4 compares the three types of XML storage and show the pros (+) and cons (-) based on the different application design considerations.
Category | Relational Storage with XMLType Views | CLOB XMLType | Schema-Based XMLType |
---|---|---|---|
Modeling | (+) Relational modeling with data normalization(+) Relational storage that handles the schema evolution well(+) No XML overhead after data is stored in database(+) Easy reuse of the data | (–) No data normalization(+)Flexible storage when XML Schemas evolve or document size varies(–) Document view of the XML data prevents reuse as SQL data | (–) No data normalization(–) Limited XML Schema evolution support(+) Data reused by building up multiple views on XML data |
Loading | (–) Low throughput for data uploading | (+) Fast XML uploading | (–) Low throughput for data uploading |
Query | (+) High performance for SQL queries(–) Requires table joins for XML queries | (–) Low performance for XPath queries with run-time DOM building and DOM tree transversal. | (+) XPath queries are rewritten to SQL queries for high-performance data retrieval |
XML fidelity | (+) No DOM fidelity | (+) Maintains the original XML byte for byte | (+) DOM fidelity as an option when setting the xdb:maintainDOM= “true” |
Data replication | Full support | Full support | Limited support |
In summary, using the relational tables to create XMLType views or simply deliver XML encapsulated output using SQL/XML functions or the DBMS_XMLGEN package, you have the normalized data storage with high-performance SQL queries. However, the DOM fidelity of the XML documents is lost.The CLOB storage best preserves the original XML document and has the flexibility to handle XML Schema evolution. However, the run-time DOM building and DOM tree transversal slows down the SQL queries.The XML Schema–based XMLTypes store XML in Object-Relational tables preserving the XML document with high-performance SQL queries resulting from query rewrite support. However, it has limited support of XML schema evolution and limited data replications support. You should analyze your application requirements and pick the best storage that meets your needs.
XMLType Object Tables or XMLType Columns?
Creating XMLType as a table column or as an object table is determined by whether you want to store relational data along with the XML documents. If there is relational data associated with the XML documents, such as the document create time and the current owner of the XML document, you can create a set of columns in relational tables to store such information and use XMLType columns to store XML documents.On the other hand, there are options on how to create XMLType object tables. You can create an XMLType object table using the “CREATE TABLE … OF XMLType” command or create the XMLType object table when registering XML schemas to the Oracle XML DB. However, the latter option allows you to use the Oracle XML DB Repository to manage the XML documents. By looking at what the Oracle XML DB Repository offers and what you need or how you can take advantage of the functionality, you can decide how to create the XMLType object tables.
Use Oracle XML DB Repository?
To use the Oracle XML DB Repository, you need to create XMLType tables by specifying the xdb:defaultTable attribute for the root element of the XML document in the XML schema. After you do so, Oracle XML DB will create XMLType Object tables when registering the XML schemas. Using XML DB Repository, you have the high-performance hierarchical document navigation and easy-to-use protocol interfaces for document management.
How to Retrieve and Generate XML
After you decide how to store XML documents, you need to figure out how to build the indexes and views for querying XML data, generating new XML based on the existing XML/non-XML data, and securing the XML data by controlling user access. We will discuss the details of these XML features in Chapter 11 and explore several solutions in the application development chapters.
How to Search XML Data
In Oracle Database 10g, there are two ways to search XML data using the XPath-based queries: Oracle Text searches using the CONTAINS() function and the XPath search using the XMLType.existsNode() function. In Oracle Database 10g, you can produce efficient searches by creating the CTXXPATH index for the XMLType.existsNode() queries. We will discuss this in Chapter 11.
How to Design XML Database for Web Applications
XML is widely used in content management and Web publishing systems. One of the reasons is that data in XML format can be easily transformed using XSLT to various presentation formats, such as HTML, WML (Wireless Markup Language), SVG, or any other Web publishing format that clients request. These kinds of applications normally have the structure shown in Figure 8-6.
Figure 8-6: XML-based web application architecture
In this type of application, an XML document can be stored in XML Schema–based XMLTypes so that you can extract the data for different content-publishing purposes and uses the Oracle XML DB Repository for easy document management. To reuse the XML data, XMLType views can be created, which provide “pretransformed” XML documents. Finally, indexes are usually created for frequently executed queries to speed XML retrieval from the database.
How to Design XML Database for Messaging Application
XML is widely used for application integration where applications run on different platforms or are from different vendors that need to exchange the application data. Figure 8-7 illustrates the architecture of an example XML-based messaging system. In this kind of application, you should think about using precompiled XSLT or XMLType views to transform the XML data into appropriate formats for the application receiving the XML messages.
Figure 8-7: XML messaging application