XMLUnless you've been living in a vacuum for the past few years, you will have heard the acronym XML thrown around with increasing regularity. If you're primarily an Excel developer, you're probably also wondering what all the fuss is about. XML is a format used for the textual expression of data. In that respect, it's no different from the fixed-width, comma-separated or tab-delimited text formats we've been using for years. There are, however, a number of key factors that differentiate XML from all the other text formats that have come before it and make it much more appealing to developers:XML is a structured format, which means that we can define exactly how the data is to be arranged, organized and expressed within the file. When we are given a file, we can validate that it conforms to a specific structure, prior to importing the data. As we know the structure of the file in advance, we know what it contains and how to process each item. Prior to XML, the only structure in a text file was positionalwe knew the bit of text after the fourth comma should be a date of birthand we had no way to validate whether it was a date of birth, or even a date, or whether it was in day/month/year or month/day/year order.XML is a described format, which means that within the text file, every item of data has a name that is both human- and machine-readable as well as being uniquely identifiable. We can open these files, read their contents and understand the data they contain, without having to refer back to another document to find out what the text after the fourth comma represents (and was that comma a separator, or part of the text of the second item?). Similarly, we can edit these documents with a fairly high level of confidence that we're making the correct changes.XML can easily describe hierarchical data and the relationships between data. If we want to import and export a list of authors, with their names, addresses and the books they've written, deciding on a reasonable format for a CSV file is by no means straightforward. Using XML, we can define what an Author item is and that it has a name, address and multiple Book items. We can also define what a Book item is and that it has a title, a publisher and an ISBN. The hierarchy and relationships are a natural consequence of the definition.XML can be validated, which means we can provide a second XML filean XML schema definition filethat describes exactly how the XML data file should be structured. Before processing an XML file, we can compare it with the schema to ensure it conforms to the structure we expect to receive.XML is a discoverable format, which means programs (including Excel 2003) can parse an XML data file and infer the structure and relationships between the items. This means we can read an XML file, infer its structure and generate new XML data files that conform to the same structure, with a high degree of confidence the new XML data files will pass validation.XML is a strongly typed format, which means the schema definition file specifies the data type of each element. When importing the data, the application can check the schema definition to identify the data type to import it as. We no longer run the risk of the product code 01-03 being imported as a date.XML is a global format. There is only one way to express a number in an XML file (with U.S. number formats) and only one way to express a date. We no longer have to check whether a CSV file was created with U.S. or French settings and adjust our processing of it accordingly.XML is a standard format. The way in which the content of an XML file is defined has been specified by the World Wide Web Consortium (W3C). This allows applications (including Excel 2003) to read, understand and validate the structure of an XML file and create files that conform to the specified structure. It also allows different applications to read, write, understand and validate the same XML files, enabling us to share data between applications in an extremely robust manner. So is there anything we can do with XML we couldn't do using technologies we already know? No, not really. But then, there's nothing we can do with a spreadsheet we couldn't also do with a pen and paper (and maybe a basic calculator!). Since the earliest computers, we've been storing data and sharing it between applications. If we control both ends of the dialogue, it doesn't matter what's passed between them, so long as each end knows what to supply and what to expect and nothing goes wrong. If the format of a file is documented, any application could (in theory) be programmed to read and write the same data files. With XML files, an application can read (or infer) the structure definition and join in any conversation without extra programming. Using XML just makes some things a whole lot easier and more reliable. An Example XML FileListing 23-1 shows an example XML file for an author, including his name, e-mail address and some of the books he has been involved with. Listing 23-1. An Example XML FileIf XML lives up to its hype, you should have been able to read and understand all the items of data in that file and understand the relationships between the elements. Just in case, we'll highlight the main items:The first line identifies the contents of the file as XML. Every XML file starts with this line.The file consists of both data and pairs of tags surrounding the data, which are together called an element. Our file consists of Author, Name, Email, Book, Title, Publisher and ISBN elements. A tag is identified by text enclosed within angle brackets, like <Tag>. All the tags come in pairs, with an opening tag like <Tag> and a closing tag like </Tag>; all the text between the opening and closing tags in some way "belongs" to the tag. However, if there is nothing contained within the opening and closing tags, they can be combined so that <Tag></Tag> can be shown as <Tag/>. This is often used when the data for an element is provided as an attribute of the element, using a syntax like <Publisher name="Addison Wesley"/>. There is little difference between using elements or attributes, though our preference is to use elements. Note that tags and attributes are case-sensitive, so <Author> will not match with </author>.The second line identifies a root element, which in this file represents an Author. Every XML file must have one and only one root element; all other elements in the file belong to the root element.The third and fourth lines identify the author's name and e-mail address; we know it's the author's name and e-mail address because they're both within the same <Author> element.The fifth line is the start of a Book element, with the next three lines giving the book's details (because they're contained within the Book element). The ninth line closes the Book element, telling us we've finished with that book.Lines 10 to 14 show a second Book element, with the book's details.Line 15 closes the Author element, telling us we've finished with that author. That example hopefully demonstrates the main attributes of an XML file. It is structured, described, hierarchical and relational, but how is it validated? An Example XSD fileThe structure of an XML file is specified using an XML schema definition file, which usually has the extension .xsd and contains sets of XML tags that have been defined by the W3C. The XSD file for the Author XML data is shown in Listing 23-2. Listing 23-2. An Example XSD FileThis is slightly less readable XML! We explain how to create an XSD file later in the chapter, but it's helpful to understand how this file describes the structure of the XML data file shown in Listing 23-1:Like all XML files, the first line identifies the contents as XML.The second line identifies the namespace [ http://www.w3.org/2001/XMLSchema ] and gives it the alias, xs. This is the namespace defined by the W3C that contains all the XML tags used in XML schema definition files. When we need to use a tag from that namespace, we precede it with the xs: alias identifier so the XML processor can correctly identify it. This mechanism of using namespace aliases is often encountered in XML files that contain elements from multiple namespaces (such as Excel workbook files, which contain tags from both the Excel and Office namespaces).The third line defines an Author element which must occur once and only once in the file (unless otherwise specified, the default occurrence of a tag is 'must occur once and only once'), so our XML data file can only be for one author.The fourth line states that the Author element is a complexType, which means it contains other elements.The fifth line states that all the items within the Author element must be listed in the sequence shown in the XSD file (that is, Name, then Email, then Book).The sixth line defines an element within Author called Name, of type string and there must be one and only one of them. The use of the /> at the end of the element tag is a shorthand for creating a self-closing tag, so <Tag/> is equivalent to <Tag></Tag>.The seventh and eighth lines define an element within Author called Email, of type string, which doesn't have to occur (minOccurs="0") or there can be any number of them (maxOccurs="unbounded").Lines 915 define an element within Author called Book, of which there can be any number. If provided, each Book element must contain a single Title, Publisher and ISBN string element in that order.Lines 1622 close out the tags. Before we import any data files, we can check that they conform to these rules (assuming we have the XSD file to check them against) and reject any files that can't be validated. Overview of Excel 2003's XML FeaturesNOTEThe XML features added to Excel 2003 are only available in the Professional version of Office and Standalone version of Excel; they have been disabled in the Standard and Student versions of Office. In practice, this means that if we want to utilize the new XML features, we and all our users must be running Office 2003 Professional.Throughout this book, we've been stressing the importance of physically separating our data from our code, so we can easily update our code without affecting the data; our PETRAS timesheet add-in has undergone some major changes, but our timesheet template file has stayed (pretty much) the same throughout. We've been a little quiet, though, about what we should consider our "code" to be, and hence where to put the break between application and data. That is because the only real choice we've had is to put the break at the boundary between VBA and our Excel workbooks and templates. Whenever we've had a new set of data to store, we've stored it inside a copy of our template.That leaves us a little concerned and hopeful that we don't have to change anything in the template. If we discovered a bug in the data validation settings, we would have to open and update every copy of every timesheet submitted using that template (or just ignore it for archived files!). We haven't really separated our data from our logic. Within each of our data files, we're storing lots of formatting, validation and ancillary information as well as the data entered into the timesheet.What we would really like to do is to completely separate the raw data from the formatting and data validation, so we would only need one copy of the data-entry workbook on each machine which could import and export the raw data. That's exactly what Excel 2003's XML features enable us to do!Using Excel 2003's new XML Source task pane, we can import an XML schema definition file into a workbook and link the elements defined in that file to cells (for the single elements) or Lists (for the multiple-occurring elements) in the workbook.We can then import any XML data file that conforms to the schema into our workbook. Excel will parse the XML data file, check that it conforms to the schema, read the data from all the elements and populate the linked cells and lists. Figure 23-1 shows an Excel 2003 workbook containing the XSD from Listing 23-2 and having imported the XML data from Listing 23-1. Figure 23-1. An Excel Workbook Linked to an XML Schema[View full size image] ![]() A Simple Financial ModelTo demonstrate how Excel 2003 uses XML, we'll create a simple financial model that calculates the net present value of a list of cash flows, giving us the number of flows, the total cash flow and the net present value. We'll also record the model's version number and the date and time the model was calculated. Figure 23-2 shows the spreadsheet for the model, which can also be found in the Model1.xls workbook on the CD in the \Concepts\Ch23Excel, XML and Web Services folder. Figure 23-2. The Net Present Value Calculation Model![]() Creating an XML Schema DefinitionThe first step is to create an XML schema definition (XSD) file to define our raw data. If we already have an XML file containing some data we want to import, Excel can infer an XSD from it. Excel generally does quite a good job at inferring the structure, but we have more control over the details if we define it ourselves. For example, in the Authors XML file in Listing 23-1, the data file included a single e-mail address. Excel will infer the schema only allows one address, but the real schema allows multiples. Excel also always assumes data is optional, while we've made the author name mandatory.All the input data is shown with a light shading in Figure 23-2, from which we can see the structure we would like to emulate:There is a single block of control information, which must exist.Within the control information, we have a name, e-mail address and comment. For this example, we'll make the name and e-mail required, but the comment optional. Each item can only occur once (if at all) and they're all strings.We then have a single block of data information, which must exist.The data information contains a single Rate figure and multiple Flows figures, all of which are Doubles. Although not required by the NPV function, we'll require a minimum of two cash flow amounts. The XSD for this data is shown in Listing 23-3, which includes a root NPVModelData element to contain our data types. Listing 23-3. The XSD File for the NPV Model DataAs this is XML, you should be able to read Listing 23-3 and see the direct correlation to the data in our worksheet and the previous statements about the structure we want to emulate. A few noteworthy points are as follows:We always start an XSD file with the same first two lines.Every element that is a container of other elements must be followed by the <XSD:complexType> tag and a tag to identify how the elements are contained. In this example (and in most cases), we use the <XSD:sequence> tag to say that the elements are contained in the sequence shown.The Comment element includes the attributes minOccurs="0" maxOccurs="1", which is how we specify an optional item; it doesn't have to occur (minOccurs="0"), but if it does occur, there can only be one of them (maxOccurs="1").The Flows element includes the attributes minOccurs="2" maxOccurs="unbounded", which is how we specify that there must be at least two cash flows, but there can be any number. Theoretically, we should put maxOccurs="65527", as that is the maximum number of flows that will fit on our model worksheet. XML MapsNow that we have an XSD file describing our data, we need to tell Excel to use it and to link each element in the XSD file to a worksheet cell or range. Importing the schema and linking it to cells is known as mapping and Excel refers to these as XML Maps (which is Excel's terminology, not an industry-wide one).So let's map our XSD to our model. Open the Model1.xls file, click View > Task Pane and select the XML Source task pane from the drop-down in the task pane title bar, shown in Figure 23-3. Figure 23-3. Selecting the XML Source Task Pane[View full size image] ![]() Figure 23-4. The XML Map Dialog After Adding the NPVModelData Schema![]() Figure 23-5. The XML Source Task Pane, Showing the NPVModelData Schema![]() Figure 23-6. Drag and Drop the Elements from the Task Pane to the Worksheet[View full size image] ![]() Figure 23-7. The Recommended Settings for XML Map Properties![]() Exporting and Importing XML DataThe menu items to import and export our XML data can be found on the Data > XML menu, with toolbar buttons also located on the List toolbar. Using the Export XML menu results in the XML data file for our model shown in Listing 23-4. Listing 23-4. The XML Data File Produced from Our ModelHopefully, everything in the file makes sense by now, particularly the multiple <Flows> elements. If we delete the <Comment> element, add a few more <Flows> elements to the bottom, save it with a different name and use the Import XML menu to import it into our model, we get the worksheet shown in Figure 23-8. Remember that our XSD file specified the <Comment> tag as optional, so our file passes the schema validation even though the comment data is missing. The extra <Flows> elements have been included in the List, which has automatically extended to accommodate them, and the formulas in cells E9:E11 have also automatically been adjusted to suit! Figure 23-8. Importing an XML Data File Adjusts the Ranges![]() The XML Object Model and EventsNow that we can import and export the raw data for the model, we'll probably want to import the data and then export the results, with the export file containing a copy of the input data, details about the model itself, such as the version number and when the calculation was done, and the model's results. Listing 23-5 shows the XSD file for the full set of our NPVModel data, which can be found on the CD in the NPVModel.XSD file. The definition for the NPVModelData schema from Listing 23-5 has been included inside the new root NPVModel tag and we've added elements for the model details and results. It looks complicated, but isn't reallyjust remember that when we want to nest one element inside another, we have to include a pair of <XSD:complexType> and <XSD:sequence> tags between them. Listing 23-5. The Full XSD File for Our ModelWe can add this schema to our model as a second XML map and map the NPVModelDetails and NPVModelResults elements to the appropriate cells in column E. When we try to map the ControlInformation elements to the cells in column B, however, Excel displays an error message "The operation cannot be completed because the result would overlap an existing XML mapping" and prevents us from doing the mapping. This is because Excel limits us to a one-to-one relationship between cells and XML elements; any one cell can only map to one element from one XML map and vice versa. We want all our input data to map to both the NVPModelData map (so we can import it) and the NVPModel map (so we can include it in the export). The only way we can achieve our objective is to have a copy of the input data that we include in our NPVModel map, as shown in Figure 23-9. All the single items, such as the e-mail address and rate, can be linked using standard worksheet formulae, but the lists will have to be synchronized through VBA. Figure 23-9. Mapping the NPVModel Elements to a Copy of the Input Data![]() Listing 23-6. Copying the Input Flows List to the Export CopyWithin the object model, the linking between ranges and XML schema elements is done using XPaths. The XPath is a concatenated string of all the element names in an element's hierarchy, so to get to the Flows element in the NPVModelData map, we start at the root NPVModelData, go down to the InputData element and then to the Flows element, so the XPath for the Flows element in that map is /NPVModelData/InputData/Flows. This is stored in the XPath property of the Range object, so we can directly find out which element a range is mapped to. To find the range mapped to a given element, we use the XMLMapQuery and XMLDataQuery methods, passing the XPath of the element. It's a curiosity of the object model that while XML maps are workbook-level items and an element can be mapped to any range in any sheet in the workbook, the XMLMapQuery and XMLDataQuery methods are worksheet-level methods. If we didn't know which sheet the range was on, we would have to scan through them all, repeating the XMLMapQuery for each.Both XMLMapQuery and XMLDataQuery return the range that is mapped to a given XPath string. The only difference between them is when the mapped range is a List; the XMLMapQuery returns the full range of the List, including the header row, whereas the XMLDataQuery returns only the data in the List, or Nothing if the List is empty.With just a few mouse clicks, we can now import some raw data for our financial model, recalculate it and export the results, giving an XML data file like the one shown in Listing 23-7. Listing 23-7. The XML Data File from Our NPV ModelIt's not hard to envisage our financial model being used as a "black box" service, whereby individuals (or other applications) submit XML files containing the raw data for the model, we import it, calculate and export the results and send them back.Notice the very specific format used for the date and time in the CalcDate element, which is how XML avoids the issues of identifying different date formats. It doesn't, however, account for different time zones!By adding the ability to export results directly from our model, we've also created a vulnerability. Users could import data into that map as well, which would overwrite our formulas! We can prevent this using the Workbook_BeforeXMLImport event, as shown in Listing 23-8. Listing 23-8. Prevent Importing of the Results XML
XML Support in Earlier VersionsExcel 2003 has made the handling of arbitrary XML files extremely easy, but we don't have to upgrade to Excel 2003 to use XML. As mentioned at the start of the chapter, XML is just another text file format, so in theory we can read and write XML files using standard VBA text handling and file I/O code. When Excel 2003 imports an XML data file, it uses the MSXML library to do the validation and parsing of the file, and there's nothing stopping us referencing the same library from VBA. Of course, we also have to write our own routines to import the data from the MSXML structure to the worksheet and export the data from the sheet to an XML file. Multiple-version compatibility is one of the key design goals for our PETRAS timesheet application, so we show the VBA technique in the Practical Example section at the end of this chapter.The VBA technique is also required in Excel 2003 if the structure of the XML data is too complex to be handled by Excel's fairly simplistic mapping abilities. For example, our PETRAS timesheet workbook includes a table of clients and projects, with the client names across the top and the projects listed below each client (to feed the data validation drop-downs). It is not possible to map an XML schema to that layout, so the import of that section of the XML data file has to be done with VBA in all versions of Excel. Using NamespacesAll of the examples shown so far in this chapter have ignored the use of namespaces. This means the XML files we use and produce are only identified by the root elements of NPVModel and NPVModelData. There is nothing in the file to identify them as the data for our NPV model. This means that, in theory, someone else could create an XML file that uses a very similar structure to ours and we could import it without knowing it was not intended for our application. To avoid this, we can include a namespace identifier both in the XSD and XML files, which is used to uniquely identify all the tags in the file, and hence the data they contain. When the file is processed, the namespace is prepended to all the tags, allowing the parser to distinguish between, say, the Name element in this file denoting the author's name and the Name element in a workbook file denoting an Excel Defined Name. The text of the namespace can be any string, but should be globally unique. It is general practice to use a URL, which has the advantage that the viewer of the file could browse to the URL in the hope of finding a description of the namespace.We tell Excel the namespace to use by including it within the <XSD:schema> tag at the top of our XSD file, as shown in Listing 23-9 and included on the CD in the file NPVModelData - NS.xsd. Listing 23-9. Providing Excel with a NamespaceWhen that schema is added to a workbook, Excel will remember the namespace, create an alias for it, such as ns0, ns1, ns2 and so on, and add that alias to the front of all the elements in the file, as shown in Figure 23-10. Figure 23-10. All the XML Elements are Prefixed with the Namespace Alias![]() Listing 23-10. Providing Excel with a NamespaceIt is definitely a good practice to use namespaces in our XML files, to avoid any chance of Excel importing erroneous data into our applications. The only reason we haven't used them so far in this chapter is to avoid overcomplicating our explanation of Excel's XML features. ![]() |