<a name="341"></a><a name="wbp10ch09P1"></a>Chapter 9: Storing XML Data - Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

Mark V. Scardina, Ben ChangandJinyu Wang

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید






Chapter 9: Storing XML Data

In Oracle Database 10g, you have a number of choices for storing XML data. You can shred the XML documents and store the data in one or more relational tables, put them intact in CLOB XMLTypes, or register an XML schema and store them in an XML Schema–based XMLType with object-relational storage. If there is no requirement for updating the XML content, you can also store the XML documents externally by creating External Tables.

This chapter gives an overview of the XML storage options available in Oracle Database 10g and shows you various examples of how to use the technologies. You will also learn how to use the Oracle utilities including the SQL*Loader and XML SQL Utility (XSU) to load XML documents into either XMLType tables or relational tables in Oracle Database 10g. We start with the simplest storage format: the CLOB XMLTypes.


Storing XML Documents in CLOB XMLTypes


Using the CLOB XMLType, XML documents are stored as CLOBs with a set of XML interfaces provided by the XMLType. Though you can optionally carry out any XML processing during the data ingestion, such as validating the input XML against an XML schema or a DTD, the CLOB XMLType storage does not require any XML processing except well-formedness checking and entity resolution.


Updating and Querying CLOB XMLTypes


The CLOB XMLType storage best preserves the original format of XML documents and gives the maximum flexibility for XML schema evolution. However, storing XML documents in CLOB XMLTypes results in expensive processing overhead when querying the XML content, such as using the XMLType.Extract() or XMLType.ExistsNode() functions, because these operations require building an XML DOM tree in memory at run time and performing functional XPath evaluations. In addition, any update operation can be performed only at the document level. This means that you need to update the entire XML document for even a small change to one XML element. Therefore, normally you should avoid using XMLType functions to perform fine-grained XML updates or XPath-based queries on CLOB XMLTypes.

Instead, for XPath-based queries on CLOB XMLTypes, Oracle Text provides a full text search supporting a limited set of XPaths. This functionality allows you to perform XPath queries on CLOB XMLTypes utilizing the CONTEXT index created by Oracle Text, and it has proven very useful and scalable for enterprise applications, which we will discuss in Chapter 11.


Dealing with Character Encoding for CLOB XMLTypes


When storing XML documents in the Oracle database, you should know that a character set conversion is automatically performed during data insertions, which converts all the text data, including XML documents, to the database character set, except when stored as BLOB, NCHAR, or NCLOB data types.

Because of this implicit character set conversion, the actual XML data encoding and the encoding declaration in the <?XML?> prolog may not be the same. In the current Oracle Database 10g release, XMLType APIs ignore the encoding declaration in the <?XML?> prolog and assume that XML data in CLOB XMLTypes is stored in the database character set. Therefore, when loading XML data from the client side, you need to make sure this conversion is properly performed.

To ensure proper conversion from the client character set to the database character set, you are required to set up the NLS_LANG environment variable to reflect the client character set encoding if the XML document is originally stored in a client character set that is different from the database character set. Otherwise, if the variable is set to be the same as the database character set, the original text will be stored as-is in the database without character validation and conversion.

In other words, if the NLS_LANG environment variable is not set or is set incorrectly and the XML document does not have the same encoding as the database, garbage data will be stored in the database.





Note

If the XML document contains characters that are invalid in the database character set, you will get an Invalid Character error during the data insertions to CLOB XMLTypes. The current solution for this is to use the NCLOB or BLOB for data storage in the database and build mid-tier XML applications or PL/SQL external procedures using the XDK APIs to process the XML data.


Because the character set conversion may result in conflict between the actual encoding and the encoding declaration in the <?XML?> prolog, when reading the XML data out of CLOB XMLTypes, you must do the reverse character set conversion or update the encoding declaration in the <?XML?> prolog to make them consistent. This is important because although an XML parser can use the first 4 bytes of the <?XML?> prolog to detect the encoding of XML documents, it can determine only whether the character encoding is an ASCII-based encoding or EBCDIC encoding. If it is an ASCII-based encoding, an XML parser can detect only whether it is UTF-8 or UTF-16. Otherwise, it depends on the encoding attributes in <?XML?>. Therefore, if you have XML documents not in UTF-8 or UTF-16 encoding, you must include a correct XML encoding declaration indicating which character encoding is in use, as follows:

<?xml version="1.0" encoding='Shift-JIS'?>

/ 218