Professional Excel Development [Electronic resources] : The Definitive Guide to Developing Applications Using Microsoft® Excel and VBA® نسخه متنی

XML

Unless you've been living in a vacuum for the past few years, you will have heard the acronym XML thrown around with increasing regularity. If you're primarily an Excel developer, you're probably also wondering what all the fuss is about. XML is a format used for the textual expression of data. In that respect, it's no different from the fixed-width, comma-separated or tab-delimited text formats we've been using for years. There are, however, a number of key factors that differentiate XML from all the other text formats that have come before it and make it much more appealing to developers:

XML is a structured format, which means that we can define exactly how the data is to be arranged, organized and expressed within the file. When we are given a file, we can validate that it conforms to a specific structure, prior to importing the data. As we know the structure of the file in advance, we know what it contains and how to process each item. Prior to XML, the only structure in a text file was positionalwe knew the bit of text after the fourth comma should be a date of birthand we had no way to validate whether it was a date of birth, or even a date, or whether it was in day/month/year or month/day/year order.

XML is a described format, which means that within the text file, every item of data has a name that is both human- and machine-readable as well as being uniquely identifiable. We can open these files, read their contents and understand the data they contain, without having to refer back to another document to find out what the text after the fourth comma represents (and was that comma a separator, or part of the text of the second item?). Similarly, we can edit these documents with a fairly high level of confidence that we're making the correct changes.

XML can easily describe hierarchical data and the relationships between data. If we want to import and export a list of authors, with their names, addresses and the books they've written, deciding on a reasonable format for a CSV file is by no means straightforward. Using XML, we can define what an Author item is and that it has a name, address and multiple Book items. We can also define what a Book item is and that it has a title, a publisher and an ISBN. The hierarchy and relationships are a natural consequence of the definition.

XML can be validated, which means we can provide a second XML filean XML schema definition filethat describes exactly how the XML data file should be structured. Before processing an XML file, we can compare it with the schema to ensure it conforms to the structure we expect to receive.

XML is a discoverable format, which means programs (including Excel 2003) can parse an XML data file and infer the structure and relationships between the items. This means we can read an XML file, infer its structure and generate new XML data files that conform to the same structure, with a high degree of confidence the new XML data files will pass validation.

XML is a strongly typed format, which means the schema definition file specifies the data type of each element. When importing the data, the application can check the schema definition to identify the data type to import it as. We no longer run the risk of the product code 01-03 being imported as a date.

XML is a global format. There is only one way to express a number in an XML file (with U.S. number formats) and only one way to express a date. We no longer have to check whether a CSV file was created with U.S. or French settings and adjust our processing of it accordingly.

XML is a standard format. The way in which the content of an XML file is defined has been specified by the World Wide Web Consortium (W3C). This allows applications (including Excel 2003) to read, understand and validate the structure of an XML file and create files that conform to the specified structure. It also allows different applications to read, write, understand and validate the same XML files, enabling us to share data between applications in an extremely robust manner.

So is there anything we can do with XML we couldn't do using technologies we already know? No, not really. But then, there's nothing we can do with a spreadsheet we couldn't also do with a pen and paper (and maybe a basic calculator!). Since the earliest computers, we've been storing data and sharing it between applications. If we control both ends of the dialogue, it doesn't matter what's passed between them, so long as each end knows what to supply and what to expect and nothing goes wrong. If the format of a file is documented, any application could (in theory) be programmed to read and write the same data files. With XML files, an application can read (or infer) the structure definition and join in any conversation without extra programming. Using XML just makes some things a whole lot easier and more reliable.

An Example XML File

Listing 23-1 shows an example XML file for an author, including his name, e-mail address and some of the books he has been involved with.

Listing 23-1. An Example XML File


<?XML version="1.0" encoding="utf-8" ?>
<Author>
<Name>Stephen Bullen</Name>
<Email>stephen@oaltd.co.uk</Email>
<Book>
<Title>Professional Excel Development</Title>
<Publisher>Addison Wesley</Publisher>
<ISBN>0321262506</ISBN>
</Book>
<Book>
<Title>Excel 2002 VBA Programmer's Reference</Title>
<Publisher>Wrox Press</Publisher>
<ISBN>1861005709</ISBN>
</Book>
</Author>

If XML lives up to its hype, you should have been able to read and understand all the items of data in that file and understand the relationships between the elements. Just in case, we'll highlight the main items:

The first line identifies the contents of the file as XML. Every XML file starts with this line.

The file consists of both data and pairs of tags surrounding the data, which are together called an element. Our file consists of Author, Name, Email, Book, Title, Publisher and ISBN elements. A tag is identified by text enclosed within angle brackets, like <Tag>. All the tags come in pairs, with an opening tag like <Tag> and a closing tag like </Tag>; all the text between the opening and closing tags in some way "belongs" to the tag. However, if there is nothing contained within the opening and closing tags, they can be combined so that <Tag></Tag> can be shown as <Tag/>. This is often used when the data for an element is provided as an attribute of the element, using a syntax like <Publisher name="Addison Wesley"/>. There is little difference between using elements or attributes, though our preference is to use elements. Note that tags and attributes are case-sensitive, so <Author> will not match with </author>.

The second line identifies a root element, which in this file represents an Author. Every XML file must have one and only one root element; all other elements in the file belong to the root element.

The third and fourth lines identify the author's name and e-mail address; we know it's the author's name and e-mail address because they're both within the same <Author> element.

The fifth line is the start of a Book element, with the next three lines giving the book's details (because they're contained within the Book element). The ninth line closes the Book element, telling us we've finished with that book.

Lines 10 to 14 show a second Book element, with the book's details.

Line 15 closes the Author element, telling us we've finished with that author.

That example hopefully demonstrates the main attributes of an XML file. It is structured, described, hierarchical and relational, but how is it validated?

An Example XSD file

The structure of an XML file is specified using an XML schema definition file, which usually has the extension .xsd and contains sets of XML tags that have been defined by the W3C. The XSD file for the Author XML data is shown in Listing 23-2.

Listing 23-2. An Example XSD File


<?XML version="1.0" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" type="xs:string"/>
<xs:element name="Email" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="Book"
minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Publisher" type="xs:string"/>
<xs:element name="ISBN" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

This is slightly less readable XML! We explain how to create an XSD file later in the chapter, but it's helpful to understand how this file describes the structure of the XML data file shown in Listing 23-1:

Like all XML files, the first line identifies the contents as XML.

The second line identifies the namespace [ http://www.w3.org/2001/XMLSchema ] and gives it the alias, xs. This is the namespace defined by the W3C that contains all the XML tags used in XML schema definition files. When we need to use a tag from that namespace, we precede it with the xs: alias identifier so the XML processor can correctly identify it. This mechanism of using namespace aliases is often encountered in XML files that contain elements from multiple namespaces (such as Excel workbook files, which contain tags from both the Excel and Office namespaces).

The third line defines an Author element which must occur once and only once in the file (unless otherwise specified, the default occurrence of a tag is 'must occur once and only once'), so our XML data file can only be for one author.

The fourth line states that the Author element is a complexType, which means it contains other elements.

The fifth line states that all the items within the Author element must be listed in the sequence shown in the XSD file (that is, Name, then Email, then Book).

The sixth line defines an element within Author called Name, of type string and there must be one and only one of them. The use of the /> at the end of the element tag is a shorthand for creating a self-closing tag, so <Tag/> is equivalent to <Tag></Tag>.

The seventh and eighth lines define an element within Author called Email, of type string, which doesn't have to occur (minOccurs="0") or there can be any number of them (maxOccurs="unbounded").

Lines 915 define an element within Author called Book, of which there can be any number. If provided, each Book element must contain a single Title, Publisher and ISBN string element in that order.

Lines 1622 close out the tags.

Before we import any data files, we can check that they conform to these rules (assuming we have the XSD file to check them against) and reject any files that can't be validated.

Overview of Excel 2003's XML Features

NOTE

The XML features added to Excel 2003 are only available in the Professional version of Office and Standalone version of Excel; they have been disabled in the Standard and Student versions of Office. In practice, this means that if we want to utilize the new XML features, we and all our users must be running Office 2003 Professional.

Throughout this book, we've been stressing the importance of physically separating our data from our code, so we can easily update our code without affecting the data; our PETRAS timesheet add-in has undergone some major changes, but our timesheet template file has stayed (pretty much) the same throughout. We've been a little quiet, though, about what we should consider our "code" to be, and hence where to put the break between application and data. That is because the only real choice we've had is to put the break at the boundary between VBA and our Excel workbooks and templates. Whenever we've had a new set of data to store, we've stored it inside a copy of our template.

That leaves us a little concerned and hopeful that we don't have to change anything in the template. If we discovered a bug in the data validation settings, we would have to open and update every copy of every timesheet submitted using that template (or just ignore it for archived files!). We haven't really separated our data from our logic. Within each of our data files, we're storing lots of formatting, validation and ancillary information as well as the data entered into the timesheet.

What we would really like to do is to completely separate the raw data from the formatting and data validation, so we would only need one copy of the data-entry workbook on each machine which could import and export the raw data. That's exactly what Excel 2003's XML features enable us to do!

Using Excel 2003's new XML Source task pane, we can import an XML schema definition file into a workbook and link the elements defined in that file to cells (for the single elements) or Lists (for the multiple-occurring elements) in the workbook.

We can then import any XML data file that conforms to the schema into our workbook. Excel will parse the XML data file, check that it conforms to the schema, read the data from all the elements and populate the linked cells and lists. Figure 23-1 shows an Excel 2003 workbook containing the XSD from Listing 23-2 and having imported the XML data from Listing 23-1.

Figure 23-1. An Excel Workbook Linked to an XML Schema

[View full size image]

We can also type data into the linked cells and lists and export the data as an XML file. Excel will create an XML data file that conforms to the schema and contains the data from the linked cells and lists.

Excel's XML features can greatly help with the maintenance of our financial models as well. Until Excel 2003, if we wanted to use our model to analyze different data sets, we would have to use a separate copy of the model workbook for each set. If we subsequently found an error in our model, we would have to open and update all the copies. In Excel 2003, we can create a schema for our model's input variables and another schema for its results, include them both in the workbook and link them to the relevant cells or Lists. We can then use a single copy of the model workbook to import the input variables, calculate the model and export the results.

The new XML features are, of course, all exposed to VBA, so we can easily identify which cells are linked to which elements of which schemas (and vice versa), read and write the XML to/from strings as well as (or instead of) importing and exporting files and respond to events raised both before and after XML data is imported or exported.

A Simple Financial Model

To demonstrate how Excel 2003 uses XML, we'll create a simple financial model that calculates the net present value of a list of cash flows, giving us the number of flows, the total cash flow and the net present value. We'll also record the model's version number and the date and time the model was calculated. Figure 23-2 shows the spreadsheet for the model, which can also be found in the Model1.xls workbook on the CD in the \Concepts\Ch23Excel, XML and Web Services folder.

Figure 23-2. The Net Present Value Calculation Model

Note that the Flows data in B9:B13 is in an Excel 2003 List, so as data is typed into it, the references used in the functions in cells E9:E11 are automatically updated. This is obviously a very simple financial model to demonstrate the principles. In practice, there may be many sets of input data, many worksheets of calculations, pivot tables and so forth, and a large set of results.

Let's assume for now we want to analyze many sets of datain this case, different combinations of rates and cash flows, and we want to store each set of data somewhere, so we can come back to it at a later date. Let's also imagine this is a large and complex model, so we would prefer not to have multiple copies of it to keep in sync.

What we'd really like to do is tell Excel what bits of the file are the raw data and be able to import and export just that data in a form we could edit and maybe even create offline. With Excel 2003, we can do exactly that.

Creating an XML Schema Definition

The first step is to create an XML schema definition (XSD) file to define our raw data. If we already have an XML file containing some data we want to import, Excel can infer an XSD from it. Excel generally does quite a good job at inferring the structure, but we have more control over the details if we define it ourselves. For example, in the Authors XML file in Listing 23-1, the data file included a single e-mail address. Excel will infer the schema only allows one address, but the real schema allows multiples. Excel also always assumes data is optional, while we've made the author name mandatory.

All the input data is shown with a light shading in Figure 23-2, from which we can see the structure we would like to emulate:

There is a single block of control information, which must exist.

Within the control information, we have a name, e-mail address and comment. For this example, we'll make the name and e-mail required, but the comment optional. Each item can only occur once (if at all) and they're all strings.

We then have a single block of data information, which must exist.

The data information contains a single Rate figure and multiple Flows figures, all of which are Doubles. Although not required by the NPV function, we'll require a minimum of two cash flow amounts.

The XSD for this data is shown in Listing 23-3, which includes a root NPVModelData element to contain our data types.

Listing 23-3. The XSD File for the NPV Model Data


<?XML version="1.0" ?>
<XSD:schema xmlns:XSD="http://www.w3.org/2001/XMLSchema">
<XSD:element name="NPVModelData">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="ControlInformation">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="SubmittedBy" type="XSD:string" />
<XSD:element name="Email" type="XSD:string" />
<XSD:element name="Comment" type="XSD:string"
minOccurs="0" maxOccurs="1" />
</XSD:sequence>
</XSD:complexType>
</XSD:element>
<XSD:element name="InputData">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="Rate" type="XSD:double" />
<XSD:element name="Flows" type="XSD:double"
minOccurs="2" maxOccurs="unbounded" />
</XSD:sequence>
</XSD:complexType>
</XSD:element>
</XSD:sequence>
</XSD:complexType>
</XSD:element>
</XSD:schema>

As this is XML, you should be able to read Listing 23-3 and see the direct correlation to the data in our worksheet and the previous statements about the structure we want to emulate. A few noteworthy points are as follows:

We always start an XSD file with the same first two lines.

Every element that is a container of other elements must be followed by the <XSD:complexType> tag and a tag to identify how the elements are contained. In this example (and in most cases), we use the <XSD:sequence> tag to say that the elements are contained in the sequence shown.

The Comment element includes the attributes minOccurs="0" maxOccurs="1", which is how we specify an optional item; it doesn't have to occur (minOccurs="0"), but if it does occur, there can only be one of them (maxOccurs="1").

The Flows element includes the attributes minOccurs="2" maxOccurs="unbounded", which is how we specify that there must be at least two cash flows, but there can be any number. Theoretically, we should put maxOccurs="65527", as that is the maximum number of flows that will fit on our model worksheet.

XML Maps

Now that we have an XSD file describing our data, we need to tell Excel to use it and to link each element in the XSD file to a worksheet cell or range. Importing the schema and linking it to cells is known as mapping and Excel refers to these as XML Maps (which is Excel's terminology, not an industry-wide one).

So let's map our XSD to our model. Open the Model1.xls file, click View > Task Pane and select the XML Source task pane from the drop-down in the task pane title bar, shown in Figure 23-3.

Figure 23-3. Selecting the XML Source Task Pane

[View full size image]

Click the XML Maps… button at the bottom of the XML Source task pane to bring up the XML Maps dialog, click the Add button on the dialog and browse to the XSD file. If the XSD is valid, Excel will import the schema and create an XML map using it, as shown in Figure 23-4. If there is an error in the XSD, Excel will show you where it thinks the error is. Note that if we selected an XML data file instead of the XSD, Excel would infer a schema from the XML data. It is definitely best practice, though, to create and use an XSD file.

Figure 23-4. The XML Map Dialog After Adding the NPVModelData Schema

When we click OK on the XML Map dialog, Excel examines the schema and displays it in the XML Source task pane, as shown in Figure 23-5.

Figure 23-5. The XML Source Task Pane, Showing the NPVModelData Schema

Note that Excel has identified the hierarchical structure of the schema, the elements that are required (shown with an asterisk in the icon) and the elements that are repeating (shown by the arrow at the bottom of the Flows icon).

The final step is to associate the elements in the schema with the data-entry cells in our model worksheet. We do this by selecting each element from the tree in the task pane, dragging it to the worksheet and dropping it on the cell that we want to link it to. In Figure 23-6, we're dragging the SubmittedBy element and dropping it on cell B4.

Figure 23-6. Drag and Drop the Elements from the Task Pane to the Worksheet

[View full size image]

Similarly, we'll map the rest of the schema to our worksheet by dropping the Email element to B5, the Comment element to B6, the Rate element to A10 and the Flows element to B10 (or anywhere inside the Flows list). As we do that, Excel annoyingly adjusts the column widths of each cell to fit the data it contains. We would much prefer the default behavior to not do that, but we can switch it off by right-clicking one of the mapped cells and choosing XML > XML Map Properties from the popup menu to display the XML Map Properties dialog shown in Figure 23-7, in which we've set the properties that we recommend using. We should be able to access this dialog from the XML Maps dialog we used to select a map, but for some reason, we can't!

Figure 23-7. The Recommended Settings for XML Map Properties

The first check box Validate data against schema for import and export defaults to off, but in our opinion is the most important setting in the whole of Excel's XML support. With it turned on, Excel will verify that the XML data files we import conform to the format defined in the schema and that the data we type into cells conforms to the schema before allowing us to export it. Turning off those checks seems to us to invalidate the whole point of using XML in the first placethat of reliable and robust data transfer.

That's it! We've defined the raw data our financial model uses, created an XSD file to formally specify it, added the schema to the model and linked the elements in the schema to the model's data entry cells. The completed workbook can be found in the Model2.xls workbook.

Exporting and Importing XML Data

The menu items to import and export our XML data can be found on the Data > XML menu, with toolbar buttons also located on the List toolbar. Using the Export XML menu results in the XML data file for our model shown in Listing 23-4.

Listing 23-4. The XML Data File Produced from Our Model


<?XML version="1.0" encoding="UTF-8" standalone="yes"?>
<NPVModelData>
<ControlInformation>
<SubmittedBy>Stephen Bullen</SubmittedBy>
<Email>stephen@oaltd.co.uk</Email>
<Comment>Fee Fi Fo Fum</Comment>
</ControlInformation>
<InputData>
<Rate>0.05</Rate>
<Flows>10</Flows>
<Flows>20</Flows>
<Flows>30</Flows>
<Flows>40</Flows>
</InputData>
</NPVModelData>

Hopefully, everything in the file makes sense by now, particularly the multiple <Flows> elements. If we delete the <Comment> element, add a few more <Flows> elements to the bottom, save it with a different name and use the Import XML menu to import it into our model, we get the worksheet shown in Figure 23-8. Remember that our XSD file specified the <Comment> tag as optional, so our file passes the schema validation even though the comment data is missing. The extra <Flows> elements have been included in the List, which has automatically extended to accommodate them, and the formulas in cells E9:E11 have also automatically been adjusted to suit!

Figure 23-8. Importing an XML Data File Adjusts the Ranges

We have achieved our goal of being able to totally separate our data from our model, importing and exporting the data as we choose, with the model automatically updating to use the new data as we import it.

The XML Object Model and Events

Now that we can import and export the raw data for the model, we'll probably want to import the data and then export the results, with the export file containing a copy of the input data, details about the model itself, such as the version number and when the calculation was done, and the model's results. Listing 23-5 shows the XSD file for the full set of our NPVModel data, which can be found on the CD in the NPVModel.XSD file. The definition for the NPVModelData schema from Listing 23-5 has been included inside the new root NPVModel tag and we've added elements for the model details and results. It looks complicated, but isn't reallyjust remember that when we want to nest one element inside another, we have to include a pair of <XSD:complexType> and <XSD:sequence> tags between them.

Listing 23-5. The Full XSD File for Our Model


<?XML version="1.0" ?>
<XSD:schema xmlns:XSD="http://www.w3.org/2001/XMLSchema">
<XSD:element name="NPVModel">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="NPVModelData">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="ControlInformation">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="SubmittedBy"
type="XSD:string" />
<XSD:element name="Email" type="XSD:string" />
<XSD:element name="Comment" type="XSD:string"
minOccurs="0" maxOccurs="1" />
</XSD:sequence>
</XSD:complexType>
</XSD:element>
<XSD:element name="InputData">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="Rate" type="XSD:double" />
<XSD:element name="Flows" type="XSD:double"
minOccurs="2" maxOccurs="unbounded" />
</XSD:sequence>
</XSD:complexType>
</XSD:element>
</XSD:sequence>
</XSD:complexType>
</XSD:element>
<XSD:element name="NPVModelDetails">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="ModelVersion" type="XSD:string" />
<XSD:element name="CalcDate" type="XSD:dateTime" />
</XSD:sequence>
</XSD:complexType>
</XSD:element>
<XSD:element name="NPVModelResults">
<XSD:complexType>
<XSD:sequence>
<XSD:element name="FlowCount" type="XSD:double" />
<XSD:element name="FlowTotal" type="XSD:double" />
<XSD:element name="FlowNPV" type="XSD:double" />
</XSD:sequence>
</XSD:complexType>
</XSD:element>
</XSD:sequence>
</XSD:complexType>
</XSD:element>
</XSD:schema>

We can add this schema to our model as a second XML map and map the NPVModelDetails and NPVModelResults elements to the appropriate cells in column E. When we try to map the ControlInformation elements to the cells in column B, however, Excel displays an error message "The operation cannot be completed because the result would overlap an existing XML mapping" and prevents us from doing the mapping. This is because Excel limits us to a one-to-one relationship between cells and XML elements; any one cell can only map to one element from one XML map and vice versa. We want all our input data to map to both the NVPModelData map (so we can import it) and the NVPModel map (so we can include it in the export). The only way we can achieve our objective is to have a copy of the input data that we include in our NPVModel map, as shown in Figure 23-9. All the single items, such as the e-mail address and rate, can be linked using standard worksheet formulae, but the lists will have to be synchronized through VBA.

Figure 23-9. Mapping the NPVModel Elements to a Copy of the Input Data

Fortunately, Excel 2003 includes a rich object model and event model for working with XML maps. We can use the Workbook_BeforeXMLExport event to copy the Flow data from the input range (B9 and below) to the export range (G8 and below), using the mapping to identify the ranges in each case, as shown in Listing 23-6.

Listing 23-6. Copying the Input Flows List to the Export Copy


'Run before any XML is exported
Private Sub Workbook_BeforeXMLExport(ByVal Map As XMLMap, _
ByVal Url As String, Cancel As Boolean)
Dim rngSource As Range
Dim rngTarget As Range
'Are we exporting the full Model data?
If Map.RootElementName = "NPVModel" Then
'Find the data part of the target list
Set rngTarget = Sheet1.XMLDataQuery( _
"/NPVModel/NPVModelData/InputData/Flows")
'If there is any existing data in the target list,
'remove it.
If Not rngTarget Is Nothing Then rngTarget.Delete
'Find the data part of the source list
Set rngSource = Sheet1.XMLDataQuery( _
"/NPVModelData/InputData/Flows")
'Is there any source data to copy?
If Not rngSource Is Nothing Then
'Find the header part of the target list
Set rngTarget = Sheet1.XMLMapQuery( _
"/NPVModel/NPVModelData/InputData/Flows")
'Copy the data to the cell below the target list header
rngSource.Copy
rngTarget.Cells(1).Offset(1, 0).PasteSpecial xlValues
End If
End If
End Sub

Within the object model, the linking between ranges and XML schema elements is done using XPaths. The XPath is a concatenated string of all the element names in an element's hierarchy, so to get to the Flows element in the NPVModelData map, we start at the root NPVModelData, go down to the InputData element and then to the Flows element, so the XPath for the Flows element in that map is /NPVModelData/InputData/Flows. This is stored in the XPath property of the Range object, so we can directly find out which element a range is mapped to. To find the range mapped to a given element, we use the XMLMapQuery and XMLDataQuery methods, passing the XPath of the element. It's a curiosity of the object model that while XML maps are workbook-level items and an element can be mapped to any range in any sheet in the workbook, the XMLMapQuery and XMLDataQuery methods are worksheet-level methods. If we didn't know which sheet the range was on, we would have to scan through them all, repeating the XMLMapQuery for each.

Both XMLMapQuery and XMLDataQuery return the range that is mapped to a given XPath string. The only difference between them is when the mapped range is a List; the XMLMapQuery returns the full range of the List, including the header row, whereas the XMLDataQuery returns only the data in the List, or Nothing if the List is empty.

With just a few mouse clicks, we can now import some raw data for our financial model, recalculate it and export the results, giving an XML data file like the one shown in Listing 23-7.

Listing 23-7. The XML Data File from Our NPV Model


<?XML version="1.0" encoding="UTF-8" standalone="yes"?>
<NPVModel>
<NPVModelData>
<ControlInformation>
<SubmittedBy>Stephen Bullen</SubmittedBy>
<Email>stephen@oaltd.co.uk</Email>
<Comment>Fee Fi Fo Fum</Comment>
</ControlInformation>
<InputData>
<Rate>0.05</Rate>
<Flows>10</Flows>
<Flows>20</Flows>
<Flows>30</Flows>
<Flows>40</Flows>
</InputData>
</NPVModelData>
<NPVModelDetails>
<ModelVersion>1.0</ModelVersion>
<CalcDate>2004-07-01T13:44:04.430</CalcDate>
</NPVModelDetails>
<NPVModelResults>
<FlowCount>4</FlowCount>
<FlowTotal>100</FlowTotal>
<FlowNPV>86.49</FlowNPV>
</NPVModelResults>
</NPVModel>

It's not hard to envisage our financial model being used as a "black box" service, whereby individuals (or other applications) submit XML files containing the raw data for the model, we import it, calculate and export the results and send them back.

Notice the very specific format used for the date and time in the CalcDate element, which is how XML avoids the issues of identifying different date formats. It doesn't, however, account for different time zones!

By adding the ability to export results directly from our model, we've also created a vulnerability. Users could import data into that map as well, which would overwrite our formulas! We can prevent this using the Workbook_BeforeXMLImport event, as shown in Listing 23-8.

Listing 23-8. Prevent Importing of the Results XML


'Run before any XML is imported
Private Sub Workbook_BeforeXMLImport(ByVal Map As XMLMap, _
ByVal Url As String, ByVal IsRefresh As Boolean, _
Cancel As Boolean)
'Are we importing to the full Model data?
If Map.RootElementName = "NPVModel" Then
'Yes, so disallow it
MsgBox "The XML file you selected contains the " & _
"results for this model, and can not be imported."
'Cancel the import
Cancel = True
End If
End Sub

XML Support in Earlier Versions

Excel 2003 has made the handling of arbitrary XML files extremely easy, but we don't have to upgrade to Excel 2003 to use XML. As mentioned at the start of the chapter, XML is just another text file format, so in theory we can read and write XML files using standard VBA text handling and file I/O code. When Excel 2003 imports an XML data file, it uses the MSXML library to do the validation and parsing of the file, and there's nothing stopping us referencing the same library from VBA. Of course, we also have to write our own routines to import the data from the MSXML structure to the worksheet and export the data from the sheet to an XML file. Multiple-version compatibility is one of the key design goals for our PETRAS timesheet application, so we show the VBA technique in the Practical Example section at the end of this chapter.

The VBA technique is also required in Excel 2003 if the structure of the XML data is too complex to be handled by Excel's fairly simplistic mapping abilities. For example, our PETRAS timesheet workbook includes a table of clients and projects, with the client names across the top and the projects listed below each client (to feed the data validation drop-downs). It is not possible to map an XML schema to that layout, so the import of that section of the XML data file has to be done with VBA in all versions of Excel.

Using Namespaces

All of the examples shown so far in this chapter have ignored the use of namespaces. This means the XML files we use and produce are only identified by the root elements of NPVModel and NPVModelData. There is nothing in the file to identify them as the data for our NPV model. This means that, in theory, someone else could create an XML file that uses a very similar structure to ours and we could import it without knowing it was not intended for our application. To avoid this, we can include a namespace identifier both in the XSD and XML files, which is used to uniquely identify all the tags in the file, and hence the data they contain. When the file is processed, the namespace is prepended to all the tags, allowing the parser to distinguish between, say, the Name element in this file denoting the author's name and the Name element in a workbook file denoting an Excel Defined Name. The text of the namespace can be any string, but should be globally unique. It is general practice to use a URL, which has the advantage that the viewer of the file could browse to the URL in the hope of finding a description of the namespace.

We tell Excel the namespace to use by including it within the <XSD:schema> tag at the top of our XSD file, as shown in Listing 23-9 and included on the CD in the file NPVModelData - NS.xsd.

Listing 23-9. Providing Excel with a Namespace


<?XML version="1.0" ?>
<XSD:schema xmlns:XSD="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.oaltd.co.uk/ProExcelDev/NPVModelData"
xmlns:md="http://www.oaltd.co.uk/ProExcelDev/NPVModelData"
elementFormDefault="qualified" >
<XSD:element name="NPVModelData">
...

When that schema is added to a workbook, Excel will remember the namespace, create an alias for it, such as ns0, ns1, ns2 and so on, and add that alias to the front of all the elements in the file, as shown in Figure 23-10.

Figure 23-10. All the XML Elements are Prefixed with the Namespace Alias

When the XML is exported, Excel includes the namespace in the file and qualifies all the elements with the namespace alias, as shown in Listing 23-10.

Listing 23-10. Providing Excel with a Namespace


<?XML version="1.0" encoding="UTF-8" standalone="yes"?>
<ns1:NPVModelData
xmlns:ns1="http://www.oaltd.co.uk/ProExcelDev/NPVModelData">
<ns1:ControlInformation>
<ns1:SubmittedBy>Stephen Bullen</ns1:SubmittedBy>
<ns1:Email>stephen@oaltd.co.uk</ns1:Email>
<ns1:Comment>Fee Fi Fo Fum</ns1:Comment>
</ns1:ControlInformation>
<ns1:InputData>
<ns1:Rate>0.05</ns1:Rate>
<ns1:Flows>10</ns1:Flows>
<ns1:Flows>20</ns1:Flows>
<ns1:Flows>30</ns1:Flows>
<ns1:Flows>40</ns1:Flows>
</ns1:InputData>
</ns1:NPVModelData>

It is definitely a good practice to use namespaces in our XML files, to avoid any chance of Excel importing erroneous data into our applications. The only reason we haven't used them so far in this chapter is to avoid overcomplicating our explanation of Excel's XML features.