Introducing XQuery
XQuery is described at length in the following W3C Last Call Working Draft specifications per www.w3.org/TR/xquery/ (referenced in a prior chapter):
XQuery 1.0: An XML Query Language Describes how the language is an extension of XPath 2.0 and describes in-depth the grammar and how it is processed. It describes the different expressions that are allowed (such as primary, path, sequence, arithmetic, comparison, logical, and so on), how the expressions can be nested, and the possible data types for the various expressions, operators, and functions. The module section describes the main module and the prolog that can be type-checked on a stand-alone basis.
XQuery 1.0 and XPath 2.0 Data Model Describes the data model of the XML document on which the XQuery operators function. The data model is simply a representation of the XML document, e.g., elements, attributes, namespaces, processing instructions, and so on, that is provided as input to an XML processor, along with how that information is qualified in terms of data types and allowed values. This specification also describes the different accessor functions to this information. Users do not need to concern themselves with this specification because it is aimed at implementers of XQuery processors.
XQuery 1.0 and XPath 2.0 Formal Semantics Describes the formal semantics of the language with a formal notation governed by grammar productions. It complements the first specification in this list and the XML Path Language (XPath) 2.0 specification by strictly defining the meaning of the language’s objects, such as expressions, values, and data types, with formal notations. With this specification’s formal semantics, reference implementations can easily be prototyped and problems with this language can thereby be eliminated. Like the preceding specification in this list, this specification is not intended for users.
XQuery 1.0 and XPath 2.0 Functions and Operators Introduces new functions and operators to the XPath 2.0 language and includes error, trace, constructor, strings, qualified names, context, and casting functions. It also introduces functions and operators on the XML Schema simple data types such as numeric types and values, Booleans, durations, dates, time, any URI, base64Binary, hexBinary, and NOTATION as well as on nodes, and on sequences.
XML Syntax for XQuery 1.0 (XQueryX) Maps the grammar productions outlined in the “XQuery 1.0 and XPath 2.0 Formal Semantics” specification into XML. Thus, XML parsers and XSLT stylesheets can process, query, generate, modify, and reuse the XQuery operations. It makes the productions more easy for humans to read and the queries themselves more easy to deal with.
Basics
XPath is used quite extensively in the XQuery specification, along with regular expressions, which explains the joint data model. As mentioned in previous chapters, XPath is simply a way to select of create portions of an XML document, and XQuery makes extensive use of this syntax in its queries and expressions. The joint data model upon which XQuery operators operate defines the input and output of XQuery operators. This data model relies on the concept of a sequence, which is an ordered collection, (e.g., in document order) of zero or more items, which are defined to be a node (e.g., element, attribute, text, document, comment, PI or namespace node), or an atomic value that is typed via XML Schema data types (one special value is an error value). The language itself is simply composed of keywords and such operators defined via these expressions or constructors. For example, variables, which are prefixed by $ signs, can be used in various expressions, such as loops or assignments, or in function calls or constructors.
Expressions
A number of expressions exist within the XQuery language, so we will concentrate on just the three that we think will be the most heavily used:
FLWOR expression
Path expression
Predicate expression
For these expressions, the following variation of the book.xml file will be used:
<book xmlns:bk="http://www.mcgraw-hill.com" bk:ISBN="99999-99999">
<title>Oracle Database 10g XML and SQL</title>
<author>
<name bk:type="person">
<first>Mark</first>
<last>Scardina</last>
</name>
<name bk:type="person">
<first>Ben</first>
<last>Chang</last>
</name>
<name bk:type="person">
<first>Jinyu</first>
<last>Wang</last>
</name>
</author>
<publisher>Oracle Press</publisher>
<publishyear>2003</publishyear>
<price type="US">10.00</price>
</book>
FLWOR Expression
The FLWOR expression consists of the FOR, LET, WHERE, ORDER BY, and RETURN keywords. For example, the following code loops over all book instances, finds the respective authors, and returns result nodes containing the book's title and the authors for each book:
<Result>
{
FOR $book in fn:doc("book.xml")/book
LET $author:=collection('author')/$book/author
WHERE count($author)>0
ORDER BY $author/name/last
RETURN
<title> {$book/title} </title>
<author>
{fn:string-join
(($author/name/first/text(), $author/name/last/text(),)," ")}
</author>
)
</result>
The following is the result of this XQuery:
<result>
<title>Oracle Database 10g XML and SQL</title>
<author> Ben Chang, Mark Scardina, Jinyu Wang</author>
</result>
Examining the operation of this XQuery entails the following steps:
The iteration is set to the <book> element by FOR and assigned to the $book variable.
The variable $author is assigned via LET to /<book>/<author> for all input books.
The predicate WHERE ensures that only books with authors are selected.
ORDER BY causes the results to return sorted based on the author’s last name.
RETURN defines the actual format of the result.
Note that in this case an XML document is constructed as the result in a similar fashion to the result from an XSLT template, and we will be discussing XQuery versus XSLT in the “Best Practices” section.
Path Expression
The path expression is based on the abbreviated syntax of XPath 1.0 and is extended in 2.0 with dereference operators and range predicates—i.e., expressions enclosed in square brackets that are often used to filter a sequence of values. These expressions are evaluated by performing node tests on one or more steps delineated by / or // by performing a node test. There are two types of node tests. The Kind Test checks for whether the type of XML item is either an ElementTest, AttributeTest, PITest, CommentTest, TextTest or AnyKindTest. The Name Test adds the further qualification of matching the QName as well. For example, the following are several node tests:
Element(child::book)
– matches all child elements of book
Element(child::/book/author/name, person)
– matches all names of child element,
author, of type person
Attribute() – matches any single attribute node regardless of name
Attribute(/book/@bk:ISBN) - matches
all book elements containing ISBN attributes
in the XML namespace associated with the br prefix.
Predicate Expression
The predicate expression can be used to identify certain nodes (e.g., the expression starting with title used in book[title="Oracle Database 10g XML and SQL"]), to help determine values (e.g., the expression price in book[price > 10] or book[price –10]), or to determine the ordinal position (such as book[5]).
Query Prolog
The query prolog, along with the query body, comprises an XQuery query. It contains constructs such as namespace declarations, function and variable definitions, and module imports such as XML Schema imports. The query body thus references the constructs defined or declared in the query prolog, and utilizes the aforementioned expressions to determine the result of an XQuery query. Some of the other constructs mentioned in the XQuery specification that are contained in the query prolog are version declaration, validation declaration, default namespace declaration, xmlspace declaration, and default collation.
With a namespace declaration in the query prolog, a prefix can be used in qualified names to differentiate names of elements, attributes, etc. For example,
declare namespace
bk = "http://www.oracle.com/book"
can be used to uniquely define an element <mybook> as in the qualified name <bk:mybook>. A default namespace declaration could also be made to apply to all unqualified elements, attributes, etc., such as:
declare default namespace
element namespace bk = "http://www.oracle.com/book"
without which these unqualified elements, attributes, etc., would be considered to be in no namespace.
In addition to namespace declarations, the query prolog can also contain function definitions that can be called within the query body. The function definition consists of the function name, with an optional prefix, preceded by the define function keyword and followed by a parameter list and the expressions that make up the function body. For example, in
declare default namespace
element namespace bk = http://www.oracle.com/book
define function bk:getbookname($node) {
let $name := $node/bookname
return $name
}
the function name is bk:getbookname, which contains the namespace prefix to prevent it from colliding with other getbookname functions. It takes a node as a parameter, and in its body it creates a return variable, $name, that is used to return the <bookname> nodes that are children of the passed-in node.
Finally, the import keyword can be used in the query prolog to include other bodies of definitions, such as XML schemas, which can then be referenced in the query body. For example,
import schema namespace mybook=http://www.mcgrawhill.com/book
at "http://xmlns.mcgraw-hill.com/mybook.xsd"
declare default namespace element
namespace bk = http://www.oracle.com/book
define function bk:getbookname($node) {
let $name := $node/bookname
return $name
}
Introducing XQueryX
XQueryX is simply an XML representation of an XQuery query. The XQueryX specification maps the grammar productions outlined in the XQuery 1.0 and XPath 2.0 Formal Semantics specification into XML. Thus, using XML parsers you can process, query, generate, modify, and reuse the XQuery expressions. It enables XML processes such as XSLT stylesheets to generate productions. At the time of this writing, this specification is undergoing significant rewriting and thus may not exit the W3C process as it currently stands. We mention it here for informational purposes because the Oracle XQuery prototype discussed in the following section supports the original syntax.
To give you a specific example of how the XQuery XML looks, the following FLWOR expression is transformed from
FOR $b IN document("book.xml")//book
WHERE $b/publisher = "Oracle Press" AND $b/year = "2003"
RETURN
$b/title
to the following XML representation:
<q:query xmlns:q="http://www.w3.org/2001/06/xqueryx">
<q:flwr>
<q:forAssignment variable="$b">
<q:step axis="SLASHSLASH">
<q:function name="document">
<q:constant datatype="CHARSTRING">book.xml</q:constant>
</q:function>
<q:identifier>book</q:identifier>
</q:step>
</q:forAssignment>
<q:where>
<q:function name="AND">
<q:function name="EQUALS">
<q:step axis="CHILD">
<q:variable>$b</q:variable>
<q:identifier>publisher</q:identifier>
</q:step>
<q:constant datatype="CHARSTRING">Oracle Press</q:constant>
</q:function>
<q:function name="EQUALS">
<q:step axis="CHILD">
<q:variable>$b</q:variable>
<q:identifier>year</q:identifier>
</q:step>
<q:constant datatype="CHARSTRING">2003</q:constant>
</q:function>
</q:function>
</q:where>
<q:return>
<q:step axis="CHILD">
<q:variable>$b</q:variable>
<q:identifier>title</q:identifier>
</q:step>
</q:return>
</q:flwr>
</q:query>