Perl Cd Bookshelf [Electronic resources]

Mark V. Scardina, Ben ChangandJinyu Wang

نسخه متنی -صفحه : 218/ 179

Processing XML in the Oracle JVM

Having introduced the basic steps for creating Java stored procedures, we will present two useful examples for processing XML data. The first example shows how to resolve URL references when parsing XML, XSL stylesheets, and XML Schema documents. The second example illustrates how to perform SAX XML processing in Oracle JVM.

Resolving URL References

URL references are widely used in XML, XSL, and XML Schema documents to include external data, such as external entities in XML, external XSL documents included in XSL stylesheets by <xsl:include> and document(), and external XML Schema documents included in XML schemas by <xsd:include> and <xsd:import>.

When processing XML data, the URL references are resolved by a Java class called EntityResolver. You can set a base URL by using the setBaseURL() method, and this base URL is used by the EntityResolver to update all the relative URLs in XML documents to absolute URLs. Then, the EntityResolver will use these absolute URLs, and use the Java URL support, to retrieve the data.

In Oracle JVM, if external data is stored in a file directory, you need to grant the current database user read permission on that directory:

 DBMS_JAVA.GRANT_PERMISSION(
grantee=>'DEMO',
permission_type=>'SYS:java.io.FilePermission',
permission_name=>' <DirectoryName>',
permission_action=>'read');

If external data is stored on a web server, you need to grant socket reading permission to the user:

DBMS_JAVA.GRANT_PERMISSION (
grantee=> 'DEMO',
permission_type=>'SYS:java.net.SocketPermission',
permission_name=>'<WebServerorDomainName>',
permission_action=>'connect,resolve');

However, when the referenced data is stored in database tables, which is common when processing XML data inside Oracle databases, the URL references cannot be resolved without additional help. In other words, in this case, instead of using the default EntityResolver, you need to implement a customized EntityResolver and register it to an XML parser, XSL processor, or XML schema processor. The customized EntityResolver is responsible for resolving the URL references and retrieving the data for the XML processors.

For example, in the following company.xsd XML schema file, there are <xsd:include> elements that include two additional XML schema files, Person.xsd and Product.xsd:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.company.org"
xmlns="http://www.company.org"
elementFormDefault="qualified">
<xsd:include schemaLocation="Person.xsd"/>
<xsd:include schemaLocation="Product.xsd"/>
<xsd:element name="Company">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Person" type="PersonType"
 maxOccurs="unbounded"/>
<xsd:element name="Product" type="ProductType"
 maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

The Person.xsd schema defines the PersonType:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xsd:complexType name="PersonType">
<xsd:sequence>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="SSN" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>

The Product.xsd schema defines the ProductType:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xsd:complexType name="ProductType">
<xsd:sequence>
<xsd:element name="Type" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>

All the XML schema files are stored in the created xsd_tbl table, which uses the filename as the primary key and stores the content in a VARCHAR2 column.

If you use the previously created xsd_build() procedure to parse the XML schema document, you will get a No Protocol error. To solve this problem, you need to set a customized EntityResolver to the XML schema builder:

 XSDBuilder builder = new XSDBuilder();
DBEntityResolver resolver = new DBEntityResolver();
  builder.setEntityResolver(resolver);
byte [] docbytes = xsd.getBytes();
ByteArrayInputStream in = new ByteArrayInputStream(docbytes);
XMLSchema schemadoc = (XMLSchema)builder.build(in,null);

DBEntityResolver is a class that implements the org.xml.sax.EntityResolver interface as follows:

import java.io.IOException;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import java.sql.Driver;
//Define new custom EntityResolver class using the
// internal Oracle JVM KPRB JDBC driver to connect to
public class DBEntityResolver implements EntityResolver {
Connection conn;
DBEntityResolver() {
try {
Driver d = new oracle.jdbc.driver.OracleDriver();
conn = DriverManager.getConnection("jdbc:oracle:kprb:");
} catch(Exception ex) {
ex.printStackTrace();
}
}
//Define a method to
 resolve entities by retrieving schema files
//from the xsd_tbl table.
public InputSource 
resolveEntity (String publicId, String systemId)
throws SAXException, IOException{
InputSource mySource = null;
String location = systemId;
InputStream xsd= null;
try {
PreparedStatement st =
conn.prepareStatement("select doc from xsd_tbl where name=?");
st.setString(1,location);
ResultSet rset= st.executeQuery();
while(rset.next()) {
xsd = rset.getBinaryStream(1);
}
rset.close();
st.close();
} catch(Exception ex) {
ex.printStackTrace();
}
// Create the inputSource with an InputStream as input
mySource = new InputSource(xsd);
return mySource;
}
}

Once this code has been written, you need to redeploy your Java stored procedure, after modifying your deployment profile so that it includes this new class. The signature of the method to expose as well as the signature of the matching PL/SQL procedure remain unchanged.

This class is used by the XML schema processor to resolve the URL references and retrieve referenced data from Person.xsd and Product.xsd. Since this is an important issue, let’s discuss the code in greater detail.

First, in order to retrieve data from tables, you need to set up a JDBC connection to connect the Oracle database. Therefore, in the constructor function of the DBEntityResolver class, a server-side internal JDBC driver is initialized. In Oracle Database 10g, there are three JDBC drivers provided:

JDBC Oracle Call Interface (OCI) driver (also known as thick) with the JDBC link: jdbc:oracle:oci8:<UserName>/<Password>

JDBC thin driver with the JDBC link: jdbc:oracle:thin:<UserName>/<Password>@<HostName>:<ListenerPort>:<DBSID>

JDBC server-side internal driver with the JDBC link: jdbc.oracle.kprb

Unlike the OCI and thin drivers, which are used primarily by the client applications to connect to the Oracle database, the internal JDBC driver is used for Java code running inside Oracle JVM. Because it will connect the currently connected database user running the Java program, it does not require specifying the username or password:

Driver d = new oracle.jdbc.driver.OracleDriver();
conn = DriverManager.getConnection("jdbc:oracle:kprb:");

This is an optimally tuned JDBC driver that runs directly inside the Oracle database, thereby providing the fastest access to Oracle data from Java stored procedures.

Next, you need to retrieve data from tables when the XML schema processor needs help to resolve the URL references. In this case, the resolveEntity() method will be called with the URLs passed in as the systemId. You need to overwrite the method by querying the xsd_tbl table using the URLs, which are filenames for the XML schemas, and return the data as a Java InputSource.

Finally, you need to follow the steps discussed in the previous sections to deploy the Java code and create a new Java stored procedure specification. This time, the XML schema can be parsed without errors. You can apply the same approach to resolve URLs for XML document parsing and XSLT transformation.

SAX XML Processing

Due to the nature of PL/SQL, it does not support event-based processing such as is required for SAX XML processing. However, in many cases, the large size of your XML documents requires you to use SAX to streamline the XML processing and reduce the strain on memory resources. Additionally, batch-mode SAX XML processing is more efficient than DOM XML processing when your only requirement is to read XML data. In this section, we discuss how to leverage this SAX functionality in the Oracle database.

First, whereas the previous section described how to create a Java stored procedure to parse XML Schema documents, this section discusses how to use the Java stored procedure to validate XML documents in SAX. The Java stored procedure is created after the Java method, defined as follows:

 public static String validation(CLOB xml, CHAR xsd) throws Exception
{
//Build Schema Object
XSDBuilder builder = new XSDBuilder();
EntityResolver resolver = new DBEntityResolver();
builder.setEntityResolver(resolver);
byte [] docbytes = xsd.getBytes();
ByteArrayInputStream in = new ByteArrayInputStream(docbytes);
XMLSchema schemadoc = (XMLSchema)builder.build(in,null);
//Parse the input XML document with Schema Validation
SAXParser parser = new SAXParser();
parser.setXMLSchema(schemadoc);
parser.setValidationMode(XMLParser.SCHEMA_VALIDATION);
StringWriter sw = new StringWriter();
try {
parser.parse(xml.getCharacterStream());
sw.write("The input XML parsed without errors.\n");
} catch (XMLParseException pe) {
sw.write("Parser Exception: " + pe.getMessage());
} catch (Exception e) {
sw.write("NonParserException: " + e.getMessage());
}
return sw.toString();
}

To validate the XML document, the SCHEMA_VALIDATION option for the SAX XML parser is turned on, and an XML Schema object is created and set to the SAX XML parser as follows:

parser.setXMLSchema(schemadoc);
parser.setValidationMode(XMLParser.SCHEMA_VALIDATION);

Then, the XML document is retrieved using oracle.sql.CLOB.getCharacterStream() and parsed by the SAX parser:

parser.parse(xml.getCharacterStream());

Note

The oracle.sql.CLOB provides two methods to retrieve data: getAsciiStream() and getCharacterStream(). The getAsciiStream() method only supports ASCII characters and returns an InputStream. The getCharacterStream() method supports all characters and returns a Reader. Using getCharacterStream() is always recommended, unless you know there are only ASCII characters stored in the CLOB.

After the Java code is deployed, a Java stored procedure called xsd_validation() can be created by the following statement:

CREATE OR REPLACE FUNCTION xsd_validation(xml IN CLOB, xsd IN VARCHAR2)
RETURN VARCHAR2 AUTHID CURRENT_USER AS LANGUAGE JAVA NAME
'XSDDBBuilder.validation(oracle.sql.CLOB, oracle.sql.CHAR) return
java.lang.String';

You can use it to validate XML documents in PL/SQL as follows:

SET SERVEROUTPUT ON
DECLARE
out VARCHAR2(2000);
xsd VARCHAR2(4000);
xml CLOB;
BEGIN
SELECT doc INTO xsd FROM xsd_tbl WHERE name='company.xsd';
SELECT doc INTO xml FROM xml_tbl WHERE name='company.xml';
out := xsd_validation(xml, xsd);
DBMS_OUTPUT.PUT_LINE(out);
END;

For this example, the XML documents are stored in the xml_tbl table, created as follows:

CREATE TABLE xml_tbl(
name VARCHAR2(30) PRIMARY KEY,
docCLOB);

The company.xml can then be inserted into the Oracle database with the following command:

INSERT INTO xml_tbl VALUES('company.xml','<?xml version="1.0"?>
<Company xmlns="http://www.company.org">
<Person>
<Name>Foo</Name>
<SSN>123-45-6789</SSN>
</Person>
<Product>
<Type>Good Type</Type>
</Product>
</Company>');
}

The XML Schema validation processing is performed in SAX streams. If any error occurs, the error messages will be printed on the screen. Otherwise, the processes will be successfully completed with an Input XML Parsed Without Errors message.

Second, if you need to manipulate the XML data, you can also create content handlers and register them to SAX parsers. For example, the following content handler is created to remove the white spaces in XML documents:

public class MyDocumentHandler extends XMLSAXSerializer {
public MyDocumentHandler(OutputStream out) {
super(out);
}
public void characters(char ch[], int start, int length)
throws SAXException
{
String str = new String(ch, start, length);
//Replace
str = str.replace('\t',' ');
str = str.replace('\n',' ');
str = str.replace('\r',' ');
// Collapse
str = str.trim();
char[] ca = str.toCharArray();
int i, j;
boolean seenWS = false;
for(i=0,j=0; j< str.length(); j++) {
if(ca[j] != ' ' || !seenWS) {
ca[i++] = ca[j];
if(ca[j] == ' ')
seenWS = true;
else
seenWS = false;
}
}
super.characters(ca,0,i);
}
}

XMLSAXSerializer is a new class in Oracle XDK 10g, which is defined as follows:

public class XMLSAXSerializer extends
org.xml.sax.helpers.DefaultHandler implements
oracle.xml.parser.v2.XMLConstants, org.xml.sax.ext.LexicalHandler
{…}

This Java class in Oracle XDK 10g provides serialization for SAX processing. It prints out XML data to an OutputStream or a PrintWriter with printing formats that can be customized by the users. After implementing the document handler that is handling the SAX events, you can register it to the SAX parser with the following code:

public class XMLNormalize {
public static String normalized(XMLType xml, CLOB[] result)
{
StringWriter sw = new StringWriter();
SAXParser saxParser = new SAXParser();
try {
Writer out =result[0].getCharacterOutputStream();
MyDocumentHandler docHandler =
new MyDocumentHandler(new PrintWriter(out));
saxParser.setContentHandler(docHandler);
saxParser.parse(xml.getInputStream());
out.close();
} catch(XMLParseException ex) {
sw.write("XSD Error:"+ex.getMessage());
}catch(Exception ex) {
sw.write("Error:"+ex.getMessage());
}
return sw.toString();
}
}

In this example, we create a Java method that reads the XML document from an XMLType column and uses SAX to parse and normalize the document. If the original XML document is as follows:

<rootElement> This is the test
<childElement test="true">
Value
</childElement>
</rootElement>

After the normalization, the resulting XML document will be as follows:

<rootElement>This is the test<childElement test="true"
>Value</childElement></rootElement>

Because Oracle JDeveloper 10g doesn’t support XMLType when defining Java stored procedures, you need to write your own as follows:

CREATE OR REPLACE FUNCTION normalize(p1 IN XMLType,
p2 IN OUT NOCOPY CLOB)
RETURN VARCHAR2
AUTHID CURRENT_USER
AS LANGUAGE JAVA NAME
'oracle.xml.sample.parser.XMLNormalize.normalized(oracle.xdb.XMLType,
oracle.sql.CLOB[]) return java.lang.String';

This time, you also need to add the $ORACLE_HOME/rdbms/jlib directory in the Files Groups | Project Output | Contributor option to load the xdb.jar needed for XMLType processing.

By working through the examples, you should have a better understanding of how to process XML data in SAX using Java stored procedures. Because the processing is in SAX streams, you have the advantage of using less memory, which is especially important when processing large XML documents.