Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

Mark V. Scardina, Ben ChangandJinyu Wang

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید






XPath-Based Searches

Using XPath-based queries is straightforward. You simply use the XPath expressions in XMLType functions to specify the content retrieval from XML documents. The following is an example product document in XML:

<Product>
<Name>XSLT Processor</Name>
<Description lastupdate="03-07-2003">
<Whatis>XSLT Processor makes use of
<Standard type="W3C">XSLT</Standard> language defined by the World
Wide Web Consortium (<KEYWORD>W3C</KEYWORD>) to
<Function>transform</Function> XML documents into other formats.
</Whatis>
<OracleSupport>Oracle supports XSLT in SQL, PL/SQL, JAVA and C/C++.
</OracleSupport>
</Description>
</Product>

To query the XML content, you can use the following XPath expressions:



What is the name of product?

XPath: /Product/Name



What are the W3C standards supported by the product?

XPath: /Product//Standard[@type = “W3C”]



What is the description of the product, which is updated after March 1, 2003?

XPath: /Product/Description[@lastupdate >“03-01-2003”]



In Oracle Database 10g, these kinds of XPath expressions can be used within EXISTSNODE() to search the XML documents in XMLTypes.


Searching the XML Document Using XPath


To illustrate how to use XPath in EXISTSNODE(), a table called product is created, which includes the product_id, the product name, and the description column:

CREATE TABLE product(
product_id NUMBER PRIMARY KEY,
name VARCHAR2(200),
description XMLType);

The description is defined as an XMLType column to store the product descriptions in XML. The description column uses CLOB as its storage, which by default holds up to 4GB of text. After the table is created, you can insert new product records as follows:

INSERT INTO product(product_id, name, description)
VALUES(1, 'XSLT Processor', XMLType('<Description>
<Whatis>Based on the <Standard type="W3C">XSLT</Standard> standard
defined by the World Wide Web Consortium (<KEYWORD>W3C</KEYWORD>),
XSLT Processors <Function>transform</Function> XML documents into
other formats.</Whatis>
<OracleSupport>Oracle supports XSLT in SQL, PL/SQL, JAVA and
C/C++.</OracleSupport>
<ProductDetails>
<Product name="XSLT for C">
<Description>XSLT for C in Oracle XDK 10g provides the high-performance
XSLT Virtual Machine, which compiles the XSL Stylesheets and performs the
XSLT transformation using the fixed memory stack.</Description>
<Developer>
<First_Name>John</First_Name>
<Last_Name>Smith</Last_Name>
<Email>John.Smith@oracle.com</Email>
</Developer>
</Product>
</ProductDetails>
</Description>'));

INSERT INTO product(product_id, name, description)
VALUES(2, 'XSQL', XMLType('<Description>
<Whatis>XSQL produces dynamic XML documents based on one or more SQL
queries and can optionally apply <Standard type="W3C">XSL</Standard>
Stylesheets to <Function>transform</Function> XML documents. Its Java
Servlet interface provides rich HTTP management functionality, such as
Cookies, <Standard type="W3C">HTTP</Standard> session parameters
etc.</Whatis>
<OracleSupport>Java command-line utility, run-time Servlet engine and
extensible Java development framework are provided.</OracleSupport>
<ProductDetails>
<Product name="XSQL for Java">
<Description>XSQL for Java support FOP serialization</Description>
<Developer>
<First_Name>Richard</First_Name>
<Last_Name>Lee</Last_Name>
<Email>Richard.Lee@oracle.com</Email>
</Developer>
</Product>
</ProductDetails>
</Description>'));

You can use the EXISTSNODE() function to search the product descriptions as follows:

SQL> SELECT product_id, name
2 FROM product
3 WHERE EXISTSNODE(description,
'//*[contains(.,"HTTP") and contains(.,"SQL")]')>0;
PRODUCT_ID NAME
---------- -----------------------------------
2 XSQL

The EXISTSNODE() function takes an XMLType as the first parameter and an XPath expression as the second parameter. In the XPath expression, the XPath function contains(string1,string2) is used to handle the string matching. The contains() function returns TRUE if the first string argument (string1) contains the substring equal to the second argument (string2). According to the XPath standard, you can add and or or logic operators between two contains() functions. In the preceding example, the XPath-based query returns all products that contain both HTTP and SQL in the product description.

Note that all the text in XPath expressions is case sensitive and that all the logic predicates in XPath, such as and and or, have to be in lowercase. For example, the following query produces an ORA-31013: Invalid XPath Expression error because AND is used instead of and in the logic predicates:

SQL> SELECT product_id, name
2 FROM product
3 WHERE EXISTSNODE(description,
'//*[contains(.,"HTTP") AND contains(.,"SQL")]')>0;
FROM product
*
ERROR at line 2:
ORA-31013: Invalid XPath expression

Because the XPath expressions are case sensitive, sql and SQL are treated as different strings. As a result, the following query will not return the products containing the word SQL in their description:

SELECT product_id, name
FROM product
WHERE EXISTSNODE(description,'//*[contains(.,"HTTP")
and contains(.,"sql")]')>0;

To make the search case insensitive, you can enumerate all the possible combinations of the characters in a word. For example, the following query will return all the products whose description contains either SQL or sql:

SELECT product_id, name
FROM product
WHERE EXISTSNODE (description,
'//*[contains(.,"HTTP") and ( contains(.,"SQL")
or contains(.,"sql"))]')>0;

However, covering all the possible combinations is annoying. In Oracle Database 10g, you can instead use the ora:contains() function, which provides case-insensitive text searches. Another reason to use ora:contains() is that the contains() XPath function does not have word semantics. It performs only consecutive character-by-character comparisons. This may result in unexpected query results, as shown in the following example. After the <OracleSupport> element in the product description of the XSQL Servlet has been updated, as follows:

SQL> UPDATE product SET description=UPDATEXML(description,
'/Description/OracleSupport',
'<OracleSupport>Oracle XSQL provides Java command-line utility,
run-time Servlet engine and an extensible Java development framework.
</OracleSupport>')
WHERE product_id=2;

a query on the updated product table will return the following result:

SQL> SELECT product_id, name
FROM product
WHERE EXISTSNODE(description,
'//OracleSupport[contains(.,"SQL")]')>0;
PRODUCT_ID NAME
---------- -----------------------------------
1 XSLT Processor
2 XSQL

This query result is not anticipated! The reason that the contains() function returns TRUE is that in the <OracleSupport> element of the product description for XSQL Servlet, the string XSQL contains SQL as a substring. Using ora:contains() instead, with its support for word semantics, gives an accurate text search result, as shown in the following example:

SQL> SELECT product_id, name
FROM product
WHERE EXISTSNODE(description,
'//OracleSupport[ora:contains(.,"SQL")>0]',
'xmlns:ora="http://xmlns.oracle.com/xdb"')>0;
PRODUCT_ID NAME
---------- -----------------------------------
1 XSLT Processor

In Oracle Database 10g, the ora:contains() function can be used at any place in the XPath expressions and can leverage the full text search functionality in Oracle Text, such as word stemming, fuzzy matching, proximity searching, specifying searching policies, and so on. For example, the following query uses word stemming to query the product descriptions:

SQL> SELECT product_id, name
FROM product
WHERE EXISTSNODE(description, '//*[ora:contains(.,"$base")>0]',
'xmlns:ora="http://xmlns.oracle.com/xdb"')>0;
PRODUCT_ID NAME
---------- -----------------------------------
1 XSLT Processor
2 XSQL

Though the word base is not shown in the XML document, the product is returned because the word based has the same word stem.

In Oracle Database 10g, the EXISTSNODE() function can be used in the WHERE clause of SQL queries along with any other SQL predicates or combinations of itself. Here is one example:

SELECT product_id, name
FROM product
WHERE EXISTSNODE(description, '//OracleSupport[ora:contains(.,"sql")>0]',
'xmlns:ora="http://xmlns.oracle.com/xdb"')>0
AND EXISTSNODE(description,'//Standard[@type="W3C"]')>0;

This example specifies the query for What are the Oracle products that support W3C standards and provide SQL interfaces?


How XPath-Based Searching Works


When you specify an XPath expression in EXISTSNODE(), the Oracle XML DB evaluates the XPath expressions and executes queries based on the XMLType storage.

If XML is stored in CLOB XMLTypes, the Oracle XML DB functionally evaluates the XPath expressions by building the DOM tree of the XML document in memory and resolves the XPath programmatically using the methods provided by DOM. If a CTXXPATH index is created, the XML DB engine first uses the CTXXPATH index to get the superset of the result data set and then performs DOM-based functional evaluation. If the ora:contains() function is used in the XPath expressions, the selected XML data will be loaded into memory for additional full text-search analysis.

If XML is stored in XML Schema-based XMLTypes, the Oracle XML DB first rewrites the XPath expressions into equivalent SQL statements. Then, based on the object-relational (O-R) data structures, it utilizes whichever index is available to access the data in the object tables. This query-translation process is called the query rewrite of XPath-based queries. Because of the query rewrite process, the XML data retrieval can be as fast as accessing the SQL data.

In the Oracle XML DB, query rewrite is a different process than the query rewrite widely used in data warehouse applications, where users do not need the query rewrite privileges and do not need to create materialized views or summary tables. To make sure you understand the execution process, the following example creates another table that stores the product descriptions in XML Schema-based XMLTypes. First, you need to register an XML schema:

BEGIN
DBMS_XMLSCHEMA.REGISTERSCHEMA('product.xsd',
'<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xdb="http://xmlns.oracle.com/xdb"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="Description">
<xs:complexType xdb:SQLType="DESCRIPTION_TYPE">
<xs:sequence>
<xs:element name="Whatis" xdb:SQLName="WHATIS"
xdb:SQLType="WHATIS_TYPE">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="Function" type="xs:string"/>
<xs:element name="KEYWORD" type="xs:string"/>
<xs:element name="Standard">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="type" type="xs:string" use="required"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="OracleSupport" type="xs:string"/>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>', TRUE, TRUE, FALSE,FALSE);
END;

After you have registered the XML schema, you can create a product table by specifying the structured XMLType storage for the description column:

CREATE TABLE product(
product_id NUMBER PRIMARY KEY,
name VARCHAR2(200),
description XMLType)
XMLTYPE COLUMN description ELEMENT "product.xsd#Description";

You then can insert the same set of data into the table as shown in the previous examples, except that you need to use the XMLType.CreateSchemaBasedXML() function to specify the registered XML schema URLs during data insertions:

INSERT INTO product(product_id, name, description) VALUES(1, 'XSLT
Processor', XMLType('…').CreateSchemaBasedXML('product.xsd'));

Looking at the following XPath query:

SELECT product_id, name
FROM product
WHERE EXISTSNODE(description, '/Description[@name="XSQL"]')>0;

The Oracle XML DB parses and rewrites the XPath query to a SQL query that looks like the following:

SELECT VALUE(description)
FROM product p
WHERE description.XMLDATA.name='XSLT';





Note

In the object-relational storage of XMLTypes, XMLDATA refers to the root element of the XML document in the XMLType, and SYS_NC_ROWINFO$ represents the XMLType object.


However, not all XPath-based queries can be rewritten. In Oracle Database 10g, if the XPath expression contains the following items, the query cannot be rewritten:



All XPath functions except not(), floor(), ceiling(), substring(), string-length(), and translate()



XPath variable references



Any axes except the child and attribute axes



Recursive type definitions with descendent axis



UNION operators



If the registered XML schema of an XMLType contains the following elements and the XPath expression includes nodes under these elements, the query also cannot be successfully rewritten:



Elements containing open content, namely ANY content



Elements mapped to SQL CLOBs



The advantage of query rewrite is that it enables the use of B*Tree or other indexes on the XML data to speed up the query response. Without query rewrite, the XPath queries require in-memory construction of the XML object tree and programmatic evaluation through the object tree traversal.





Note

Like the other Oracle-provided extension functions that are under the Oracle XML DB namespace (http://xmlns.oracle.com/xdb), such as ora:upper(), ora:lower(), ora:to_date(), ora:to_number() and ora:like(), the ora:contains() function is rewritten to a SQL query. Additionally, it uses the CONTEXT index if available, and the XML element or attribute used by ora:contains() is an object in the O-R tables of the XMLType.



Optimizing XPath-Based Queries Using Indexes


To speed up XPath-based queries, you can create indexes on XML documents. There are four main types of indexes available in Oracle Database 10g for XMLTypes: B*Tree indexes, bitmap indexes, function-based indexes, and CTXXPATH indexes. The following sections discuss how to use them.

B*Tree Indexes


If an XML document is stored in O-R XMLTypes, you can create B*Tree indexes on the XML content. Since XPath-based queries are rewritten to SQL queries, B*Tree indexes can speed up all rewritten SQL queries. In the following example, a B*Tree index is created on the name attribute of the <Description> element, and the EXPLAIN PLAN command is run to examine the query execution plan:

SQL> CREATE UNIQUE INDEX name_idx ON product(description.xmldata."name");
Index created.
SQL> EXPLAIN PLAN FOR
SELECT count(*)
FROM product x
WHERE x.description.EXISTSNODE( '/Description[@name="XSLT"]')=1;
Explained.
SQL> @d:\oracle\rdbms\admin\utlxpls
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------
|Id | Operation | Name |Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 496 | 3 (34) |00:00:01|
| 1 | SORT AGGREGATE | | 1 | 496 | | |
| 2 | TABLE ACCESS | | | | | |
| | BY INDEX ROWID | PRODUCT| 1 | 496 | 3 (34) |00:00:01|
|*3 | INDEX UNIQUE SCAN|NAME_IDX| 1 | | 2 (50)|00:00:01|
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("X"."SYS_NC00010$"='XSLT')
14 rows selected.

The INDEX UNIQUE SCAN that uses the NAME_IDX shows that the created B*Tree index is used when executing the XPath-based query.

Bitmap Indexes


Bitmap indexes store all the index keys in a bitmap instead of the list of ROWIDs, as in B*Tree indexes. Each bit in the bitmap corresponds to a possible ROWID. If the bit is set, it means that the row with the corresponding ROWIDs contains the key value. A mapping function converts the bit position to the actual ROWID, so the bitmap index provides the same functionality as a regular index even though it internally uses a different representation.

If the number of different key values is small, bitmap indexes are space efficient. For XML documents, you can create bitmap indexes on those XML elements containing a high duplication of data. In the following example, a bitmap index is created for the types attribute in the <Standard> element:

SQL> CREATE BITMAP INDEX bitmap_idx ON
product(description.existsNode('/Description/Whatis/Standard@type'));
Index created.

Then, you can run the following EXPLAIN PLAN command to examine the query execution details:

SQL> EXPLAIN PLAN FOR
SELECT /*+ index(x bitmap_idx) */ count(*) FROM product x
WHERE x.description.EXISTSNODE('/Description/Whatis/Standard@type')=1;
Explained.
SQL> @d:\oracle\rdbms\admin\utlxpls
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes|Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 496 | 69 |00:00:01|
| 1 | SORT AGGREGATE | | 1 | 496 | | |
|*2 | TABLE ACCESS BY | | | | | |
| | INDEX ROWID | PRODUCT | 4 | 496 | 69 |00:00:01|
| 3 | BITMAP CONVERSION | | | | | |
| | TO ROWIDS | | | | | |
| 4 |BITMAP INDEX FULL SCAN|BITMAP_IDX| | | | |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(EXISTSNODE(SYS_MAKEXML('0602BCF5CB2048E2BBA19796753F5D49',
2812,X."SYS_NC00004$",X."SYS_NC00007$"),
'/Description/Whatis/Standard@type')=1)
16 rows selected.

The BITMAP INDEX FULL SCAN that uses the BITMAP_IDX shows that the created bitmap index is used when executing the XPath-based query.

Function-Based Indexes


A function-based index is created based on the values returned by function expressions. After the values are computed and stored, SQL queries using the functions in their WHERE clauses will be sped up by use of the index. The function used for building the index can be either an arithmetic expression or an expression that contains a PL/SQL function, a C callout, or a SQL function.

In the Oracle XML DB, when query rewrite cannot be used on O-R XMLTypes because of the previously discussed limitations, you can create a function-based index using XMLType functions. The following example creates a function-based index and the EXPLAIN PLAN command is run to examine the query execution plan:

SQL> CREATE INDEX fidx_existnode ON PRODUCT(
EXTRACT(description,'/Description/Whatis/Function').getStringVal());
Index created.
SQL> EXPLAIN PLAN FOR
SELECT count(*)
FROM product x
WHERE EXTRACT(description,
'/Description/Whatis/Function').getStringVal()='transform';
Explained.
SQL> @d:\oracle\rdbms\admin\utlxpls
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes|Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 496 | 2 (50) | 00:00:01|
| 1 | SORT AGGREGATE | | 1 | 496 | | |
|* 2| INDEX RANGE SCAN| FIDX_EXISTNODE | 1 |496 | 2 (50)| 00:00:01 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access((EXTRACT(SYS_MAKEXML('1279EA6100D34050A95214274890C924',
3065,X."SYS_NC00004$",X."SYS_NC00007$"),'/Description/Whatis/Function'))
='transform')
15 rows selected.

The INDEX RANGE SCAN that uses the FIDX_EXISTNODE shows that the created function-based index is used when executing the XPath-based query.

CTXXPATH Indexes


To improve the performance of XPath-based searches, CTXXPATH index was introduced in Oracle9i R2 for XMLTypes, which serves primarily as a content filter for EXISTSNODE(). When CTXXPATH index is used by EXISTSNODE(), a superset of the results of the XPath expression is returned, as in the following query:

SELECT product_id, name
FROM product
WHERE EXISTSNODE(description,
'/Description[@name="XSLT"]/OracleSupport[ora:contains(.,"SQL")>0]',
'xmlns:ora="http://xmlns.oracle.com/xdb"')>0;

The CTXXPATH index can’t return the results for the complete XPath query since it can’t process ora:contains(), but it can filter the content and return all the documents containing the path /Description/[@name=“XSLT”]. Based on the superset returned by the CTXXPATH index, EXISTSNODE() then processes a significantly reduced number of XML documents.

Creating the CTXXPATH index is similar to creating the CONTEXT index provided by Oracle Text. For example, you can create a CTXXPATH index on the description column of the product table as follows:

CREATE INDEX ctxxpath_idx ON product(description)
INDEXTYPE IS CTXSYS.CTXXPATH;

As with the CONTEXT index, the CTXXPATH index requires index synchronizations after insert and update DML operations. Additionally, it supports only XPath without XML namespaces. If the XPath expression contains namespaces, the CTXXPATH index will not be used for XPath functions, numerical range operators, arithmetic operators, or XPath axes. The use of the CTXXPATH index also depends on the database optimizers. For example, if the Cost Based Optimizer (CBO) decides that the CTXXPATH index is too expensive to use, it also will not be used by the EXISTSNODE() function. In order to let the database optimizers properly estimate the cost, you can use the ANALYZE command or the DBMS_STATS package.

/ 218