Storing XML Documents in Relational Tables
Relational tables are normally designed without considering XML storage. However, in many cases these tables can be used to store shredded XML documents and produce a useful XML representation by creating XMLType views using Oracle Database 10g XML features or generating XML using the Oracle XDK.Storing XML data in relational tables is useful if your application needs to avoid the limitations of the XMLType storage, such as limited XML schema evolution and data replication. Relational storage is also widely used by applications that require fine-grained access to the data in XML documents while not needing to preserve the complete hierarchical structure of XML.Oracle Database 10g provides extensive support for loading, exporting, and processing XML data into relational tables. To load XML data, you can use the XML SQL Utility (XSU). XSU provides both Java and PL/SQL program interfaces and command-line utilities. Built on XSU, the TransX Utility (XML Translation) further simplifies the character set conversion during data loading, and the XSQL Servlet provides the HTTP interfaces. If the functionality provided by these utilities is not sufficient for your application, you can always use their programmatic APIs in conjunction with other XDK libraries to build your own custom solution.
XML SQL Utility
XSU provides Java-based APIs, command-line utilities, and PL/SQL packages that support loading XML data into relational tables including tables with XMLType columns. We will discuss how you can use its functionality in the following sections.
Canonical Mapping
The first thing you need to understand before using XSU is the canonical mapping used by XSU to map XML to relational tables and render the results of SQL queries in XML. In this canonical mapping, the <ROWSET> element is the root element of the XML document, and its child <ROW> elements map to rows of data in tables. The names of the child elements for each <ROW> element map to the table column names or object names from which the results are returned. The num attributes of the <ROW> elements are numbers that provide the order information. The following is an XML Schema representation of this metadata structure:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="ROWSET">
<xs:complexType>
<xs:sequence>
<xs:element name="ROW">
<xs:complexType>
<xs:sequence>
<xs:any/>
</xs:sequence>
<xs:attribute name="num" type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XSU provides ways to change the names of the <ROWSET> and <ROW> elements. For example, a CUSTOMER_TBL table is defined as follows:
CREATE TABLE CUSTOMER_TBL (
NAME VARCHAR2(100),
ADDRESS VARCHAR2(200),
EMAIL VARCHAR2(200),
PHONE VARCHAR2(50),
DESCRIPTION VARCHAR2(4000));
The XML document mapping to this table with the canonical mapping is shown as follows:
<ROWSET>
<ROW>
<NAME>Steve Jones</NAME>
<EMAIL>Steve.Jones@example.com</EMAIL>
<ADDRESS>Someroad, Somecity, Redwood Shores, CA 94065, U.S.A</ADDRESS>
<PHONE>6505723456</PHONE>
<DESCRIPTION>Very Important US Customer</DESCRIPTION>
</ROW>
</ROWSET>
Note | By default, all table, column, and object names are uppercase; therefore, if you want to have mixed-case XML documents successfully inserted you need to specify the ignoreCase options when using XSU. |
In order to run the XSU command-line utility, you need to set the following Java packages in your Java CLASSPATH:
xmlparserv2.jar Oracle XML Parser for Java
classes12.jar Oracle JDBC Drivers
xsu12.jar Oracle XML SQL Utility
Note | You may need to add the orai18n.jar package to your Java CLASSPATH when handling XML with different character sets. Or, you might get an oracle.xml.sql.OracleXMLSQLException: ‘java.sql.SQLException: Non supported character set…’ |
XSU depends on the XML parser to build a DOM and depends on the JDBC drivers to connect to the Oracle database and retrieve the table metadata. After the Java CLASSPATH is properly set, you can run the XSU command-line utility with java OracleXML, which has two options, getXML for querying the database and putXML for inserting data into the database.
Note | Update and delete operations are not included in the XSU command-line utility but they are supported by XSU through the Java and PL/SQL APIs. |
For example, to insert the XML data in contact01.xml, you can run the following command:
java OracleXML putXML -conn "jdbc:oracle:thin:@localhost:1521:orclX"
-user "demo/demo" -fileName "customer1_xsu.xml" "customr_tbl"
The data is inserted into the CUSTOMER_TBL table in the demo schema. To query the content in the table and return the results in XML, you can run the following XSU command:
java OracleXML getXML -conn "jdbc:oracle:thin:@localhost:1521:orclX"
-user "demo/demo" "SELECT * FROM customer_tbl"
The following XML document is returned:
<?xml version = '1.0'?>
<ROWSET>
<ROW num="1">
<NAME>Steve Jones</NAME>
<ADDRESS>Someroad, Somecity, Redwood Shores, CA 94065, U.S.A</ADDRESS>
<EMAIL>Steve.Jones@example.com</EMAIL>
<PHONE>6505723456</PHONE>
<DESCRIPTION>Very Important US Customer</DESCRIPTION>
</ROW>
</ROWSET>
At this point, the XML document’s data has been successfully loaded into the database. However, input XML documents are not always in the canonical format. How can you deal with these XML documents? The usual approach is to use an XSLT stylesheet to transform the XML document into the canonical format. On the other hand, you can create database object views mapping to the incoming XML format.
Object Views
If an XML document is not in the canonical format, you can create object views or XMLType views, to allow XSU to map XML documents to database tables. In the following contact.xml XML document, the contact information is stored as follows:
<Contact_List>
<Contact>
<User_id>userid</User_id>
<First_Name>Steve</First_Name>
<Last_Name>Jones</Last_Name>
<Business>
<Email>Steve.Jones@oracle.com</Email>
<Phone>(650)5769801</Phone>
<Address>
<Street1>4op11</Street1>
<Street2>500 Oracle Parkway</Street2>
<City>Redwood Shores</City>
<State>CA</State>
<Zipcode>94065</Zipcode>
<Country>USA</Country>
</Address>
</Business>
</Contact>
</Contact_List>
The database schema is defined as follows:
CREATE TYPE address_typ AS OBJECT(
street1 VARCHAR2(200),
street2 VARCHAR2(200),
city VARCHAR2(100),
state VARCHAR2(20),
zipcode VARCHAR2(20),
country VARCHAR2(20));
/
CREATE TABLE contact_tbl(
contactid VARCHAR2(15) PRIMARY KEY,
firstname VARCHAR2(100),
lastname VARCHAR2(200),
midname VARCHAR2(50),
business_phone VARCHAR2(20),
home_phone VARCHAR2(10),
cell_phone VARCHAR2(20),
business_addr address_typ,
business_email VARCHAR2(150));
Using the canonical mapping, the XML document cannot directly map to the table columns as the document contains multiple levels; thus, to insert this XML document, you need to create the following object view:
CREATE TYPE contactinfo_type AS OBJECT(
phone VARCHAR2(20),
email VARCHAR2(150),
address address_typ);
/
-- Create Object View
CREATE VIEW contact_view AS
SELECT contactid AS user_id, firstname AS first_name, lastname AS
last_name,midname AS mid_name,
contactinfo_type(business_phone,business_email,
business_addr) AS business
FROM contact_tbl;
Then, you can run a similar command to load the XML file into CUSTOMER_VIEW:
java OracleXML putXML -conn "jdbc:oracle:thin:@localhost:1521:orclX"
-user "demo/demo" -fileName "contacts.xml" "contact_view"
In this example, the contact_view is used by the Oracle database to map the XML data into underlying tables. However, in many cases these types of views are not updateable, when they include multiple table joins or object type inheritance. You then must create an INSTEAD-OF TRIGGER on the views to handle data population for these tables or objects.
Dividing XML Documents into Fragments
When storing XML documents, you sometimes do not want to map every XML element to relational table columns. Instead, you might want to store some of the XML fragments in XML into CLOBs or XMLTypes. The following example illustrates an approach using XSLT to create such XML fragments and insert those XML fragments into one XMLType table column using XSU. In the example, the input XML document is shown as follows:
<Contact_List>
<Contact>
<User_id>jwang</User_id>
<First_Name>Jinyu</First_Name>
<Last_Name>Wang</Last_Name>
<Title>Senior Product Manager</Title>
<Description>Jinyu manages the <PRODUCT>Oracle XML Developer's
Kit</PRODUCT> product.</Description>
</Contact>
</Contact_List>
The <Description> element contains mixed content, which we do not want to map to multiple table columns. A contact_tbl table is defined as follows:
CREATE TABLE contact_tbl(
contactid VARCHAR2(15) PRIMARY KEY,
firstname VARCHAR2(100),
lastname VARCHAR2(200),
midname VARCHAR2(50),
description CLOB);
To map the <Description> element to the description column using XSU, you need to apply the following setCDATA.xsl XSL stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output cdata-section-elements="CODE"/>
<!-- Identity transformation -->
<xsl:template match="*|@*|comment()|processing-instruction()|
text()">
<xsl:copy>
<xsl:apply-templates select="*|@*|comment()|processing-
instruction()|text()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Description">
<xsl:element name="Description">
<xsl:copy-of select="@*|comment()|processing-instruction()"/>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:apply-templates select="*|./*/@*|./*/comment()|./*/
processing-instruction()|text()"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
This XSLT transformation will transform the input XML document and include all the child elements of <Description> into one CDATA section so that each CDATA section can be stored to the description column by XSU.You can modify the XSL stylesheet for your application by specifying different match attributes for the following template:
<xsl:template match="Description">… </ xsl:template>
The XPath in the match attribute specifies the root element of the XML fragment to be stored.
Note | Because XSLT requires a DOM to be built in memory, you may need to split the large documents before the transformation. |
TransX Utility
When populating the Oracle database with multilingual data or data translations, or when encoding, validation is needed for each XML file. The traditional way is to switch the NLS_LANG setting as you switch loading files with different encoding information. The NLS_LANG setting has to reflect the character set of the file loaded into the database. This approach is error-prone because the encoding information is maintained separately from the data itself. Setting the NLS_LANG environment variable is also tedious work.With the TransX Utility provided in the XDK, the encoding information is kept along with the data in an XML document of a predefined format so that multilingual data can be transferred without having to switch NLS_LANG settings. TransX Utility maintains the correct character set throughout the process of translating the data and successfully loading it into the database. We will not discuss the details of how to use the TransX Utility. However, we include some samples with the chapter source code for you to try its functionality.
DBMS_XMLSTORE
DBMS_XMLSTORE is a supplied PL/SQL package that supports inserting XML data into database tables. This C-based implementation provides better performance and system manageability than the Java-based DBMS_XMLSave package. This package eliminates the overhead of starting up Oracle JVM as well as translating Java class names for each method call. In addition, DBMS_XMLSTORE is built based on SAX parsing instead of DOM parsing. Therefore, it scales well for large XML documents. You can see this in the following comparison using the SH sample schema:
SQL> SELECT count(1) FROM sales;
COUNT(1)
----------
1136945
SQL> CREATE TABLE test AS SELECT * FROM sales;
Table created.
SQL> CREATE TABLE result AS SELECT * FROM sales WHERE 0=1;
Table created.
SQL> SELECT count(1) FROM test;
COUNT(1)
----------
1136945
SQL> SELECT count(1) FROM result;
COUNT(1)
----------
0
SQL> SET timing ON
SQL> DECLARE
2 qryCtx DBMS_XMLQuery.ctxHandle;
3 v_clob CLOB;
4 savCtx DBMS_XMLSave.ctxType;
5 v_rows NUMBER;
6 BEGIN
7 -- Query out the content
8 qryCtx := DBMS_XMLQuery.newContext('SELECT * FROM test');
9 v_clob := DBMS_XMLQuery.getXml(qryCtx);
10 DBMS_OUTPUT.PUT_LINE('CLOB size = '||DBMS_LOB.GETLENGTH(v_clob));
11 -- Save the content
12 savCtx := DBMS_XMLSave.newContext('RESULT');
13 v_rows := DBMS_XMLSave.insertxml(savCtx,v_clob);
14 DBMS_XMLSave.closeContext(savCtx);
15 DBMS_OUTPUT.PUT_LINE(v_rows || ' rows inserted...');
16 END;
17 /
DECLARE
*
ERROR at line 1:
ORA-29532: Java call terminated by uncaught Java exception:
java.lang.OutOfMemoryError
ORA-06512: at "SYS.DBMS_XMLSAVE", line 114
ORA-06512: at line 13
Elapsed: 00:11:57.05
In the preceding example, the sales table in the SH sample schema described in Chapter 8 is used to generate a large XML document that, when parsed, is too large for the configured Oracle JVM memory. You can increase the JAVA_POOL_SIZE to give more memory for processing; however, this may not be sufficient, especially when this memory takes away from the database memory pool. In Oracle Database 10g, you can use DBMS_XMLSTORE to resolve this issue as follows:
DECLARE
v_clob CLOB;
savCtx DBMS_XMLSTORE.ctxType;
v_rows NUMBER;
BEGIN
-- Query out the content
SELECT doc INTO v_clob FROM temp_clob;
-- Save the content
savCtx := DBMS_XMLSTORE.newContext('RESULT');
-- Set the update columns to improve performance
DBMS_XMLSTORE.SetUpdateColumn (savCtx, 'PROD_ID');
DBMS_XMLSTORE.SetUpdateColumn (savCtx, 'CUST_ID');
DBMS_XMLSTORE.SetUpdateColumn (savCtx, 'TIME_ID');
DBMS_XMLSTORE.SetUpdateColumn (savCtx, 'CHANNEL_ID');
DBMS_XMLSTORE.SetUpdateColumn (savCtx, 'PROMO_ID');
DBMS_XMLSTORE.SetUpdateColumn (savCtx, 'QUANTITY_SOLD');
DBMS_XMLSTORE.SetUpdateColumn (savCtx, 'AMOUNT_SOLD');
-- Insert the document
v_rows := DBMS_XMLSTORE.insertxml(savCtx,v_clob);
DBMS_XMLSTORE.closeContext(savCtx);
DBMS_OUTPUT.PUT_LINE(v_rows || ' rows inserted...');
END;
It is recommended to use the DBMS_XMLSTORE SetUpdateColumn() function where applicable, as shown in the preceding example, because this allows the DBMS_XMLSTORE program to know the list of columns that need to be updated so as to use explicit SQL binding to the XML data. The previous example uses the following SQL statement when preparing the data insertion:
INSERT INTO sales(prod_id, cust_id, ..., amount_sold) values
(:1, :2, ..., :6);
This speeds up the data-insertion process by eliminating the overhead of parsing of the SQL statements in the database.