XML in Excel
If you have used Office XP, you’ll be aware that you can open and save Excel spreadsheets in an XML vocabulary designed specifically for Excel, now called SpreadsheetML. This format opened up access to other programs and made it easier to create Excel spreadsheets.Excel 2003 allows you to work with other XML vocabularies. You can use XML schemas of your own design with a workbook and map the schema elements to Excel objects. The mappings can then be used to import data files conforming to the schema and to write XML data using save or export functions.Excel now uses three separate namespaces for spreadsheet elements. We’re mainly interested in the x2 namespace, which covers mapping in Excel 2003.ss | XML Spreadsheet 2002 namespace urn:schemas-microsoft-com: office:spreadsheet |
x | XML Spreadsheet 2000 namespace urn:schemas-microsoft-com:office:excel |
x2 | XML Spreadsheet 2003 namespace urn:schemas-microsoft-com:office:excel2 |
Excel 2003 does not support the following XML schema constructs:
xsd:any
xsd:anyAttribute
abstract and substitutionGroup attributes on elements and types
Recursive elements are supported, but only to one level.
XML Maps
XML maps are objects created by Excel each time you add an external XML schema to a workbook. A map describes the relationship between schema objects and spreadsheet locations.Mapping allows a spreadsheet to exist independently of the underlying data and separates the data in a workbook from its presentation. The data remains ready to be consumed by any number of other systems that understand XML.A workbook can support multiple maps, all of which have distinct names. One reason you might want to use two maps is when the data to import belongs to one schema and the calculated results (for export) need to be in a different vocabulary. Another is that you might wish to import separate series of data at different times on to separate sheets in the same workbook, using a single schema.The XML Maps dialog box allows you to define new mappings by adding, removing, and renaming schemas. See Figure 17-1.You create this mapping in a similar way to InfoPath. The XML Source task pane displays the data source, and you can drag schema objects onto the spreadsheet. See Figure 17-2.
Figure 17-1: The XML Maps dialog box.
Figure 17-2: The XML Source task pane.
Lists
Lists are a new feature in Excel that bring more structure to a range in a worksheet. Lists have distinct columns, column headings, and an insert row for adding more data. XML lists are those where a column is related to an element in an XML map. These lists are created automatically when you drag a repeating element onto a worksheet. The List toolbar, shown in Figure 17-3, includes buttons for import, export, charting, and map properties commands.
Figure 17-3: The List and XML toolbar.
Mapped Cells
As well as mapping repeating elements to a column, you can map a nonrepeating element to an individual XML mapped cell. This is the preferred way to handle single occurring items in an XML data source. You might want to do this to place an author, title, and date at the top of a report, for example.In Figure 17-4, the values at C4 and C5 are mapped cells, while the range B13-F19 contains XML lists. Cell B20 shows an asterisk, indicating the insert row. Note that the total row (C21-F21) is contained within the lists range. It automatically moves down to accommodate added rows. Optional properties for totals cells include total (D21) and count (F21). Each cell in the header row (B13-F13) contains a drop-down list with sort options.
Figure 17-4: XML list and mapped cell.
Denormalized Data
The XML features in Excel 2003 are, naturally enough, optimized for tabular data. The team at Microsoft aimed to make it possible for users to import data from as wide a range of sources as possible, even if they could not always export it again.The looser the XML structure, and the more inconsistent the data, the less likely it is a candidate for viewing or processing in a spreadsheet. You could end up with very sparse tables in extreme cases.If you map a repeating element onto a column, and these elements are associated with single instances of other data, Excel will by default “denormalize” the data set by repeating the single-instance data for each row in the XML list. Excel can import the XML, and “fill down” the date and desk across all titles, but it can’t round-trip the information by exporting it through the same schema.If the data looks like this:
<paymentsReport>
<bureau>New York</bureau>
<period>2003-11-05</period>
<stories>
<story>
<identifier>ny1234</identifier>
<date>2003-10-10</date>
<payment>800.00</payment>
<status>Filed</status>
<category>Feature</category>
</story>
<story>
<identifier>ny1235</identifier>
<date>2003-10-10</date>
<payment>420.00</payment>
<status>Embargoed</status>
<category>Review</category>
</story>
<story>
<identifier>ny1236</identifier>
<date>2003-10-10</date>
<payment>300.00</payment>
<status>Draft</status>
<category>Bulletin</category>
</story>
</stories>
</paymentsReport>
Excel will render a denormalized listing like the one in Figure 17-5 if the file is imported with default settings.
Figure 17-5: A denormalized listing.
The same issue arises if there is a hierarchy of lists within lists in the mapping—for example, multiple title elements each containing multiple subjects.So if you want to be able to import and export identical data structures, you have two options. You either need to do some denormalizing before you import the data or (better) avoid the default behavior by creating mapped cells for all the single-instance data, as we illustrated in Figure 17-4.
XML Map Schema
The structure for the mapping of XML elements is defined in the XML Spreadsheet Schema (XMLSS). In this section we present an outline of this structure and focus on the Field element mapping. For a more detailed explanation you should read the documentation in both the Excel XML Toolkit and Microsoft Office 2003 XML Schema References. You can obtain the schema references at the following URL:[www.microsoft.com/downloads/details.aspx?FamilyId=FE118952-3547-420A-A412-] 00A2662442D9&displaylang=en
Each singly mapped cell or table is represented by an Entry element. One or more Entry elements are grouped together to form a Map. One or more Map elements may appear within the MapInfo element, which in turn occurs within the top-level Workbook element of the XMLSS file after all other child elements within the Workbook. The Schema element contains the related schema. The content model is in Figure 17-6.
Figure 17-6: The MapInfo content model.
Within the Entry element, much of the content is concerned with positioning or the view of data on the spreadsheet. The Field element is the one holding the mapping to your imported schema, and we’ll concentrate on that. There must be at least one Field element containing the mapping for a single value, or a repeating element.Aggregate specifies the aggregate type for a column, such as Average, Sum, or Count, if there is a totals row for the field. The optional AutoFilterColumn element persists sort and filter information related to a column.Cell defines the style, formula, or default value for the mapped cell. In formulas, Excel uses the R1C1 rather than the A1 notation for cell references.DataValidation specifies the validation that should be performed whenever data is entered in the cell by a user.Range contains a relative reference to the first value (table) or only absolute reference (single cell) for a field. Again, RC notation is used.
XPath is an expression relative to Entry/XPath used to retrieve data from nodes in the underlying XML document. If an XPath is specified, the XSDType of the value pointed to should also be present.See Figure 17-7 for the content model.
Object Model
You can use the Excel VBA object model to add schemas to a workbook, map a schema to a sheet, import and export data, and find data and ranges with XPath expressions. There isn’t space here to go into the XML OM in any detail, but since we’re focused here on importing data, we’ll review the import methods of the Workbook class.You can use the OpenXML method when loading data from a file into a workbook. This is the programmatic equivalent of using Open from the File menu to open an XML file in Excel. The following table lists the different options for opening XML files in Excel and their purpose.
Figure 17-7: The Entry content model.
Option | Purpose |
---|---|
xlXmlLoadImportToList | Load the data from the source file into an XML list. |
xlXmlLoadMapXml | Create an XML map based on the data in the source file. |
xlXmlLoadOpenXml | Load the data from the source file as a new workbook. |
xlXmlLoadPromptUser | Prompt the user with a dialog box for choosing one of the other three options. |
Importing XML is different in that, rather than opening an XML file as a separate workbook, Excel places the data in an existing workbook, either creating a worksheet or creating an XML list in an existing one.The XMLImport method specifies a URL as the source for the XML This method accepts four parameters, detailed in the following table.
Parameter | Description |
---|---|
URL | A string that refers to the URL target that hosts the XML data (required). |
XMLMap | An instance of the XMLMap class. In other words, if the workbook already has an XML map in place, you can use that map. If you want Excel to create a map automatically when the import occurs, then pass Nothing as the parameter (required). |
Overwrite | A boolean value that indicates whether or not the newly imported data should overwrite existing data (optional). |
Destination | A reference to the cell range where the imported is placed in a worksheet (optional). |
The last method, XMLImportXML, is similar to XMLImport, except that instead of importing data based on a URL, the method accepts the actual data string as its first parameter. All other parameters are the same.