Extensible Markup Language (XML)
XML (Extensible Markup Language) is really quite like a database itself, in that it stores the data along with the basic data definition and with an XML schema defines the complete data structure and business rules. XML is much more portable than a database and is quickly becoming the standard for data exchange among websites, applications, and other implementations that require exchange of data. The language itself is derived fro141, which in turn was designed to provide an application-independent, character-set-independent method of transferring data, especially transaction-oriented data, across systems. XML has gained acceptance as a means to communicate data across platforms, especially over the Internet. This exam objective requires you to know how to retrieve data from SQL Server to exchange with other systems. In particular, you will be required to know how to import and manipulate data using OPENXML.In general, XML is a document format that can be used to transfer data between systems. Within the SQL Server environment, XML documents can be created with the FOR XML clause of the SELECT statement. This clause provides several options for formatting XML output.
![]() | Now here is a new feature that is drilled to death on the exam. Expect several questions because it is a new feature and a strong technology. The implementation of XML is a little rough in this release of SQL Server and there is hope for improvement; but just the fact that the capability is present is worth noting. With an industry push on this standard interprogram communication technique, it is worth spending some time to master SQL Server's implementation of XML. |
<row xmlns="x-schema:#Schema1" ID="9" First="Danny" Last="Thomas"/> The name of the element, or the element type, is row. The xmlns property specifies the name of the schema, which is the format that the element will take. The element then has other attributes, such as ID, First, and Last, which all have values. The element ends with a forward slash and an angle bracket, indicating the end of the element.An element can also contain other elements as shown here:
<SalesPerson ID="1" First="Danny" Last="Thomas">
<Sales Qty="4"/>
<Sales Qty="3"/>
</SalesPerson> In this case, the SalesPerson element contains two Sales elements. Notice that on the first line there isn't a slash before the ending bracket; the matching slash for this element is on the last line. This is how objects are nested in XML.
Simplifying XML Use
To output data in XML format, the SELECT statement includes an operator called FOR XML. This specifies to SQL Server that instead of returning a rowset, it should return an XML document. There are three different options for generating the XML: RAW, AUTO, and EXPLICIT.
![]() | Where is the schema? To produce XML output that also contains the schema information for the XML, you must tack the XMLDATA qualifier to the end of the FOR XML clause. |
SELECT O.OrderID, O.CustomerID, OD.ProductID, OD.UnitPrice, OD.Quantity
From Orders AS O
JOIN [Order Details] AS OD ON O.OrderID = OD.OrderID
WHERE O.OrderID < 10251
FOR XML AUTO
<O OrderID="10248" CustomerID="VINET">
<OD ProductID="11" UnitPrice="14.0000" Quantity="12"/>
<OD ProductID="42" UnitPrice="9.8000" Quantity="10"/>
<OD ProductID="72" UnitPrice="34.8000" Quantity="5"/></O>
<O OrderID="10249" CustomerID="TOMSP">
<OD ProductID="14" UnitPrice="18.6000" Quantity="9"/>
<OD ProductID="51" UnitPrice="42.4000" Quantity="40"/></O>
<O OrderID="10250" CustomerID="HANAR">
<OD ProductID="41" UnitPrice="7.7000" Quantity="10"/>
<OD ProductID="51" UnitPrice="42.4000" Quantity="35"/>
<OD ProductID="65" UnitPrice="15.8000" Quantity="15"/></O> Note that the aliases for each table become the row identifiers within the XML output. Unfortunately, Query Analyzer by default shows only the first 256 characters of a string that's returned. For most XML queries that will not be adequate, and a setting of the maximum 8192 is recommended. To set the option within the query analyzer, select Tools, Options, Results from the menu. It is for this reason that we have tried to limit the data output size in most of the examples throughout this section by including a WHERE clause to filter the number of rows.To produce a listing that also supplies the schema elements of the XML (now that is really hard to read), you tack the XMLDATA onto the end of the FOR XML clause:
SELECT * FROM Products WHERE ProductID = 1 FOR XML AUTO, XMLDATA
<Schema name="Schema3" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:
dt="urn:schemas-microsoft-com:datatypes"><ElementType name="Products"
content="empty" model="closed"><AttributeType name="ProductID" dt:type=
"i4"/><AttributeType name="ProductName" dt:type="string"/><AttributeType
name="SupplierID" dt:type="i4"/><AttributeType name="CategoryID" dt:type
="i4"/><AttributeType name="QuantityPerUnit" dt:type="string"/>
<AttributeType name="UnitPrice" dt:type="fixed.14.4"/><AttributeType
name="UnitsInStock" dt:type="i2"/><AttributeType name="UnitsOnOrder" dt:
type="i2"/><AttributeType name="ReorderLevel" dt:type="i2"/><Attribute
Type name="Discontinued" dt:type="boolean"/><attribute type="ProductID"/
><attribute type="ProductName"/><attribute type="SupplierID"/><attribute
type="CategoryID"/><attribute type="QuantityPerUnit"/><attribute type=
"UnitPrice"/><attribute type="UnitsInStock"/><attribute type="UnitsOn
Order"/><attribute type="ReorderLevel"/><attribute type="Discontinued"/>
</ElementType></Schema><Products xmlns="x-schema:#Schema3" ProductID="1"
ProductName="Chai" SupplierID="1" CategoryID="1" QuantityPerUnit="10
boxes x 20 bags" UnitPrice="18.0000" UnitsInStock="39" UnitsOnOrder="0"
ReorderLevel="10" Discontinued="0"/> When you run the query, the actual XML comes out all on one line as a stream of data. XML output does not use linefeeds or make things readable in any fashion. The easiest way to write queries for XML, then, is to write them with the FOR XML clause left off, make sure that they are returning the data you want, and then add the FOR XML back onto the end of the query. That eliminates the need for a lot of extra formatting. The use of the RAW mode of XML output is best suited in situations in which minimum formatting is desired. In RAW mode each row is returned as an element with the identifier row. Here's an example of the same query as you just saw, returned in RAW mode:
<row OrderID="10248" CustomerID="VINET"
ProductID="11" UnitPrice="14.0000" Quantity="12"/>
<row OrderID="10248" CustomerID="VINET"
ProductID="42" UnitPrice="9.8000" Quantity="10"/>
<row OrderID="10248" CustomerID="VINET"
ProductID="72" UnitPrice="34.8000" Quantity="5"/>
<row OrderID="10249" CustomerID="TOMSP"
ProductID="14" UnitPrice="18.6000" Quantity="9"/>
<row OrderID="10249" CustomerID="TOMSP"
ProductID="51" UnitPrice="42.4000" Quantity="40"/>
<row OrderID="10250" CustomerID="HANAR"
ProductID="41" UnitPrice="7.7000" Quantity="10"/>
<row OrderID="10250" CustomerID="HANAR"
ProductID="51" UnitPrice="42.4000" Quantity="35"/>
<row OrderID="10250" CustomerID="HANAR"
ProductID="65" UnitPrice="15.8000" Quantity="15"/> If you have binary data stored in text or image data types, SQL Server ignores the data when generating XML. If you want the data to be included, you use the BINARY BASE64 option:
SELECT * FROM Employees
FOR XML RAW, BINARY BASE64
Another option available for extracting data in an XML format is EXPLICIT. The EXPLICIT option enables you to specify the format of the XML that will be created. To define the format, you have to alias the first column of output to the name Tag, name the second column Parent, and then alias each consecutive column to relate it to a specific Tag. The column names after Parent are named this way:
[ ElementName !TagNumber !AttributeName !Directive ]
So the example
SELECT 1 AS TAG, NULL AS Parent,
[Order Details].Quantity AS [Order!1!QtyPurchased],
Products.ProductID AS [Order!1!ProdID],
Products.ProductName AS [Order!1!Product],
Products.UnitsInStock AS [Order!1!OnHand]
FROM Products INNER JOIN [Order Details]
ON Products.ProductID = [Order Details].ProductID
WHERE OrderID = 10248 FOR XML EXPLICIT
returns this data:
<Order QtyPurchased="12" PersonID="11"
Product="Queso Cabrales" OnHand="22"/>
<Order QtyPurchased="10" PersonID="42"
Product="Singaporean Hokkien Fried Mee" OnHand="26"/>
<Order QtyPurchased="5" PersonID="72"
Product="Mozzarella di Giovanni" OnHand="14"/> These types of queries are quite arduous and "clunky," to say the least, but they do produce useful results.
![]() | You are not going to become an XML expert overnight, and the material presented in this book is light, to say the least. Essentially, all we have done is shown how to draw data out of SQL Server in an XML format. We have not tried to explain XML or produce an XML reference text, but knowing the material presented in this section and the one to follow should get you through the XML portion of the exam. |
Moving Data Using XML
Use the Transact-SQL OPENXML function to insert data represented as an XML document. OPENXML is used to allow access to XML data as if it were like any other table or rowset. You must first have an XML document in memory to work with. OPENXML can be used in SELECT, as well as in SELECT INTO statements where a source table or view can be specified. Here is a breakdown of the process:
- Store into a variable or otherwise attain XML data that is loaded into memory. Use the sp_xml_preparedocument system stored procedure to read and parse the XML data and create a document handle. Access the XML data using the OPENXML statement to define the format of the XML document and return the rowset. Use the sp_xml_removedocument system stored procedure to destroy the document handle created for XML access.
sp_xml_preparedocument
sp_xml_removedocument
The prepare document function will read XML text that can be provided as input directly or via a character variable. The procedure then parses the text using the XML parser. This is a dynamic link library (Msxml2.dll) used to understand the XML information being presented by the internal document. Next the data is provided as a parsed document in a state ready for consumption. This parsed document is a tree representation of the XML data. The following simple example demonstrates the use of the procedure with a direct XML text feed:
DECLARE @hdoc INT
EXEC sp_xml_preparedocument @hdoc OUTPUT,
'<SalesPerson ID="9" First="Danny" Last="Thomas">
<Sales Qty="5"/><Sales Qty="10"/></SalesPerson>'
EXEC sp_xml_removedocument @hdoc
The first thing you'll notice is the variable declaration. The @hdoc is a variable that holds a temporary value for use later, in the sp_xml_removedocument as well as in the OPENXML. The next thing to note is that an sp_xml_preparedocument and sp_xml_removedocument are paired up. This is critical to ensure appropriate use of resources. Without the removal of the document handle, the only way to recover the memory is to restart SQL Server.The sp_xml_preparedocument stored procedure has one optional argument: the xpath_namespace argument, which is used to specify an alternative namespace for the document. By default, the system uses the default namespace <root xmlns:mp="urn:schemas-microsoft-com:xml-metaprop">. If you specify an alternative namespace string, it replaces the "urn:schemas-microsoft-com: xml-metaprop" with whatever you specify. The OPENXML function has three parameters and an optional WITH clause. It takes the document handle, which is returned by the sp_xml_preparedocument procedure, the rowpattern, which specifies which rows to return, and a single-byte value that can specify flags. In most cases you will want to use a rowset filter to achieve more useful output:
SELECT * FROM openxml(@hdoc, '/SalesPerson', 1)
Using this filter will return only the attributes that pertain to the SalesPerson. Now, if you could just put the rows where they need to be, you'd have it made:
SELECT * FROM openxml(@hdoc, '/SalesPerson', 1)
WITH (FirstName varchar(30), LastName varchar(30))
That WITH clause does the trick, and you finally get decent output:
FirstName LastName
------------- -------------
Danny Thomas
So now you have the data you want, extracted from an XML rowset. Using the WITH clause is basically the same syntax as laying out the columns in a table: the column name, some space, the data type, a comma, and then the next column name.OPENXML is extremely cumbersome. You can't read XML from a file easily; you have to spend time fighting with arcane bit-field flags, and you get to completely reformat your dataand you do all that just to get a few rows. You can't declare a variable of type TEXT, so you can hold only about 8KB of XML in SQL Server at a time.