A Simple Example Using Both the XML DOM and the XML Reader/Writer Below we compare the use of the higher-level XML DOM with the lower-level forward-only XML reader/writer when working with some simple XML. The XML produced and consumed in both cases is identical. It is important to note that the data used in this example is trivially small and either approach will yield acceptable performance. Real performance differences arise as the size of the data increases. To think about this conceptually picture the UserInfo data node below repeated a hundred or a thousand times. If desired, build a data file to represent this and test your algorithms on it to contrast the approaches.Example: XML File Contents
<AllMyData> <UserInfo> <UserID>14</UserID> <Name> <FirstName>Ivo</FirstName> <LastName>Salmre</LastName> </Name> </UserInfo> </AllMyData>
It is worth noting that I have chosen to not use UserInfo as the top-level node, but rather to have one more node on top of this. This is a good design practice because XML allows only one top-level "root node." If we made UserInfo the root node, it would have limited the flexibility of storing other top-level information in the file without redoing the design. Using a generic root-level node allows the freedom to add other nodes below it as our needs expand. For example, in addition to UserInfo, I may want to add nodes for ServerInfo or ApplicationInfo that store important information that is not user specific. Additionally, I may have more than one user and may want a UserInfo node for each. The data structure above supports including multiple UserInfo sections one after another; this would not have been possible if UserData was the root node of the document.XML DOM The XML DOM (Document Object Model) works with XML data in memory represented as a tree of objects. Each XML element is represented using an in-memory object. The XML DOM approach can be thought of as "highly stateful" in that all the data necessary to re-create the XML document is loaded in as state when an XML document is read in. XML trees can be created in memory and then serialized to files or over network streams. Similarly, any XML content in the file system or XML received over any data stream can be used to populate an in-memory XML DOM tree.Having an in-memory tree of objects is a very convenient way of working with data that is of moderate size and needs only incremental updates. A 20KB file of XML can fairly rapidly be loaded into memory, worked on as a tree of objects, and saved back to the file system. As long as your data is relatively small in size, the XML DOM is a great way to create XML, work with it in memory, and output XML to a file or network stream.The utility of the DOM approach is bounded both by how much memory you have available to hold the parser-generated object tree and by how much processing power is available to parse the whole tree of XML data. The downside of the XML DOM approach is that it is monolithic; the whole XML file or stream is parsed and placed into memory before you get to access any of it. If your application only needs to work with a small amount of the data present in a large file, you are incurring a great overhead to access that data.Reasons to use an XML DOM approach:The XML DOM is a simple and powerful programming model. Having an in-memory tree representing an XML document makes it easy to work with data in a random-access way.The XML DOM is great for small and moderate amounts of data. As long as a file is reasonably small, its contents will not take up too much memory.The XML DOM is the best choice if you need to work with and potentially modify all of the XML data while it is in memory. The XML DOM is a powerful tool if your application needs to work with the XML data in a random-access way and the data needs to be re-persisted to a file or stream. Reasons to avoid using an XML DOM approach:The XML DOM programming model forces all XML data to be parsed and loaded into an in-memory tree before it can be accessed. Building a large in-memory tree of the whole document is very wasteful if your application only needs access to a small amount of the XML data inside.Using the XML DOM will result in increasingly poor performance as the size of the XML increases. Because, for large files, many objects will be created in limited device memory, this can potentially cause severe memory pressure. Additionally, all created objects will eventually need to be garbage collected, which will incur a downstream cleanup cost for your application.The XML DOM is a poor choice if you only need to use the data in a read-only way. The DOM incurs overhead but makes it easy to write the XML data back out. If you are only using the data in a read-only way or plan to write it out in a different format, you are paying a performance penalty without getting much gain. Listing 10.1 below contains sample code for reading and writing the XML data shown above using the XML DOM.Listing 10.1. Using the XML DOM to Save and Load Data from a File
using System; //----------------------------------------------- //Shows saving and loading data using the //XML Document Object Model //----------------------------------------------- public class SaveAndLoadXML_UseDOM { //XML Tags we will use in our document const string XML_ROOT_TAG = "AllMyData"; const string XML_USERINFO_TAG = "UserInfo"; const string XML_USERID_TAG = "UserID"; const string XML_NAMEINFO_TAG = "Name"; const string XML_FIRSTNAME_TAG = "FirstName"; const string XML_LASTNAME_TAG = "LastName"; //---------------------------------------------------------- //Loads the state of the user // // [in] fileName: The name of the file we are saving to // [out] userId: UserID we have loaded // [out] firstName: User's FirstName we have loaded // [out] lastName: User's LastName we have loaded //---------------------------------------------------------- public static void XML_LoadUserInfo(string fileName, out int userId, out string firstName, out string lastName) { //Start out with null values userId = 0; firstName = "; lastName = "; //Assume we have not loaded the user data bool gotUserInfoData = false; System.Xml.XmlDocument xmlDocument = new System.Xml.XmlDocument(); xmlDocument.Load(fileName); //Grab the root node System.Xml.XmlElement rootElement; rootElement = (System.Xml.XmlElement) xmlDocument.ChildNodes[0]; //Make sure the root node matches our expected text //Otherwise, this could just be some random other XML file if (rootElement.Name != XML_ROOT_TAG) { throw new Exception("Root node not of expected type!"); } //----------------------------------------- //A simple machine that iterates through all the nodes //----------------------------------------- foreach(System.Xml.XmlElement childOf_RootNode in rootElement.ChildNodes) { //If it's a UserInfo node, we want to look inside it if(childOf_RootNode.Name == XML_USERINFO_TAG) { gotUserInfoData = true; //We found the user data //-------------------------------------- //Load each of the subitems //-------------------------------------- foreach(System.Xml.XmlElement child_UserDataNode in childOf_RootNode.ChildNodes) { //UserID if(child_UserDataNode.Name == XML_USERID_TAG) { userId= System.Convert.ToInt32( child_UserDataNode.InnerText); } //UserName else if(child_UserDataNode.Name == XML_NAMEINFO_TAG) { foreach(System.Xml.XmlElement child_Name in child_UserDataNode.ChildNodes) { //FirstName if(child_Name.Name == XML_FIRSTNAME_TAG) { firstName = child_Name.InnerText; } //LastName else if(child_Name.Name == XML_LASTNAME_TAG) { lastName = child_Name.InnerText; } } //End of UserName parsing loop } //"End if" for "is UserName?" }//End of UserInfo parsing loop } //"End if" for "is UserInfo"? }//End of root node parsing loop if (gotUserInfoData == false) { throw new Exception("User data not found in XML!"); } } //----------------------------------------------------------- //Saves the state of the user // // [in] fileName: The name of the file we are saving to // [in] userId: UserID we want to save // [in] firstName: User's FirstName we want to save // [in] lastName: User's LastName we want to save //----------------------------------------------------------- public static void XML_SaveUserInfo(string fileName, int userId, string firstName, string lastName) { System.Xml.XmlDocument xmlDocument = new System.Xml.XmlDocument(); //--------------------------------------------------------- //Add the top-level document element //--------------------------------------------------------- System.Xml.XmlElement rootNodeForDocument; rootNodeForDocument = xmlDocument.CreateElement( XML_ROOT_TAG); xmlDocument.AppendChild(rootNodeForDocument); //--------------------------------------------------------- //Add the data for the user info //--------------------------------------------------------- System.Xml.XmlElement topNodeForUserData; topNodeForUserData = xmlDocument.CreateElement( XML_USERINFO_TAG); rootNodeForDocument.AppendChild(topNodeForUserData); //--------------------------------------------------------- //Add the UserID value to our document //--------------------------------------------------------- //Create a sub-node for the namespace info System.Xml.XmlElement subNodeForUserID; subNodeForUserID = xmlDocument.CreateElement(XML_USERID_TAG); subNodeForUserID.InnerText = System.Convert.ToString(userId); //Attach the UserID sub-node to the top level node topNodeForUserData.AppendChild(subNodeForUserID); //--------------------------------------------------------- //Add all the NameInfo values to our document //--------------------------------------------------------- //Create a sub-node for the namespace info System.Xml.XmlElement subNodeForNameInfo; subNodeForNameInfo = xmlDocument.CreateElement( XML_NAMEINFO_TAG); //FirstName System.Xml.XmlElement subNodeFirstName; subNodeFirstName = xmlDocument.CreateElement( XML_FIRSTNAME_TAG); subNodeFirstName.InnerText = firstName; //LastName System.Xml.XmlElement subNodeLastName; subNodeLastName = xmlDocument.CreateElement( XML_LASTNAME_TAG); subNodeLastName.InnerText = lastName; //Attach the first and last name subnodes to the NameInfo //parent note subNodeForNameInfo.AppendChild(subNodeFirstName); subNodeForNameInfo.AppendChild(subNodeLastName); //Attach the NameInfo subnode (with its children too) to //the top-level node topNodeForUserData.AppendChild(subNodeForNameInfo); //--------------------------------------------------------- //Save the document //--------------------------------------------------------- try { xmlDocument.Save(fileName); } catch (System.Exception ex) { System.Windows.Forms.MessageBox.Show( "Error occurred saving XML document - " + ex.Message); } } //End of function } //End of class
Listing 10.2. Calling the XML Save and Load Code
private void button1_Click(object sender, System.EventArgs e) { const string FILENAME = "TestFileName.XML"; //Save using the XML DOM SaveAndLoadXML_UseDOM.XML_SaveUserInfo(FILENAME, 14, "Ivo", "Salmre"); //Save using the forward only XMLWriter //SaveAndLoadXML_UseReaderWriter.XML_SaveUserInfo(FILENAME, // 18, "Ivo", "Salmre"); int userID; string firstName; string lastName; //Load using the XML DOM SaveAndLoadXML_UseDOM.XML_LoadUserInfo(FILENAME, out userID, out firstName, out lastName); //Load using the forward only XML Reader //SaveAndLoadXML_UseReaderWriter.XML_LoadUserInfo(FILENAME, // out userID, out firstName, out lastName); System.Windows.Forms.MessageBox.Show("Done! " + userID.ToString() + ", " + lastName + ", " + firstName); }
XML Forward-Only Reader/Writer In contrast to the highly stateful and random-access-capable XML DOM approach are the forward-only XMLReader and XMLWriter. These are minimally stateful in that they only maintain the minimum amount of state necessary to be able to read and write XML data and do not try to build or work with an in-memory tree of XML data. These are referred to as forward-only models because they provide a programmatic cursor that points to the current location in the XML file and work with data at that point; the cursor can be moved forward but not backward. The XMLReader has a lot of advanced features, but it basically allows applications to cursor through the nodes of an XML document. When reading in XML, the XMLReader reads in only one node and its associated attributes at any given time; think of this as akin to reading in one line of text at a time from a normal text file. When the developer is done looking at that node and its attributes, it commands the XMLReader to move onward to the next element, at which point the XMLReader discards any state it is holding pertaining to the contents of the current node. Forward-only access is a necessary trait for having the highest performance and lowest overhead.It is worth pointing out that the XML DOM is built on top of the forward-only XMLReader and XMLWriter classes. The XML DOM uses the forward-only XMLReader to parse the XML and builds an in-memory tree of the data it reads in. When writing out XML, the DOM iterates over its in-memory XML tree and pushes all of the nodes out through an XMLWriter to output them to a stream or file. By inference, anything possible with the XML DOM is possible with the XML reader and writer. The XML DOM does as efficient of a job as possible for being a stateful, general-purpose and random-access XML parser.The gains in using the XMLReader and XMLWriter instead of the XML DOM stem from optimizing for the fact that your application either does not need general-purpose parsing or that you can do with less in-memory state because you do not need to write out the full XML tree that you read in. If you do not need all the rich functionality of the XML DOM, the XMLReader and XMLWriter can allow you to get better performance by working at a lower level of abstraction.
How Do the XMLReader and XMLWriter Differ from SAX? The .NET Framework and .NET Compact Framework implement a cursor-based approach where the end developer's algorithms tell the XMLReader how to move forward and parse the next elements of XML data, but this is not the only forward-only approach to working with XML. Another popular approach to forward-only processing of XML data is the SAX (Simple API for XML) model. Whereas the XMLReader uses a cursor-based approach where the programmer chooses how and when to move the cursor forward, the SAX model is an event-based model where a parsing engine runs through the XML document (also in a forward-only way) and generates events that end developers code can process to examine the XML as it is being parsed. The XML reader model is "pull based," where application code pulls the next piece of XML it wants to work with. The SAX model is "push based," where pieces of XML are pushed to events that application code handles. They both serve the same purpose, namely to facilitate high-speed, low-overhead parsing of XML. Choosing between SAX and XML readers/writers is a matter of preference and availability on the platform you are working on. The recommendations in this section apply to both forward-only models. |
Reasons to Use a Forward-Only XML Approach Forward only models like the XMLReader offer the fastest reliable way to read XML, even from huge files. The state maintained by the framework while parsing XML is the minimum required. This state does not grow with the length of XML that has been parsed, so there is potentially no limit to the size of the XML document you can search through for data you want to extract. The only long-term state generated are the objects your application decides to create based on its parsing needs.Forward-only models such as the XMLWriter offer a fast and simple way for writing well-formed XML. The code for writing out XML using the XMLWriter is fast and simple to understand. Even for complex XML documents, your code to navigate your own internal data structures is likely to be more complex than the code that writes out the XML. Using the XMLWriter is much easier than writing your own custom code to write out XML tags. There are few, if any, reasons not to use the XMLWriter instead of designing your own custom code to write out XML tags.Forward-only models are great for extracting specific data or writing short streams of XML data. If you are looking to extract specific data from an XML document and know where that data sits in the data hierarchy of the file, using the XMLReader and your own state machine to navigate to the data is relatively straightforward. Similarly, if you know ahead of time what the format of the XML you need to write out needs to be, working with the XMLWriter is straightforward.
Reasons to Avoid a Forward-Only XML Approach Forward-only models do not support random access to document elements. You get one shot to do something with the data as you are reading it in. If your algorithm needs to dynamically cross-reference between different parts of the XML document or make related changes to different parts, you will have to write some pretty complex and stateful code to do this. Because the XML DOM maintains an in-memory tree, it is easy to walk this tree to search and make changes.Forward-only models require significant work to reconstruct an entire tree structure. If you want to write out the same XML you read in, you will be duplicating a good portion of the XML DOM's functionality to do this. XML readers are great for allowing you to pull out specific pieces of data. XML writers are great for allowing your application to quickly output specific pieces of XML. If you need to read in an XML document and make significant changes to portions of it before writing it back out, the DOM is your friend.Forward-only models offer a more complex programming model for navigating and searching complex documents. Writing generic parsing code that works with arbitrary XML hierarchies can be complex. You will need to have sophisticated state that lets you know where in the document's hierarchy you are in order to find the information you are looking for. For example, if you are looking for the <Name> tag that is inside a specific <Customer> tag and your XML document has <Name> tags that correspond to <Customer>, <Employee>, and <Vendor> objects that may exist at different depths of the XML tree, you will need to write code to keep track of where in the document you are presently looking to distinguish between these cases in order to guarantee you are accessing the correct information. If your document follows a single well-defined schema this may not be too bad. On the other hand, if your document may be in one of several schemas, the problem becomes algorithmically complex. In cases where the XML document may be of significant complexity, consider doing the processing on a server where both more processing power exists and more powerful APIs for searching XML documents also exist (for example, XPATH for document queries). Below is sample code for reading and writing the XML data shown above using the forward-only XML reader/writer. Of specific interest may be the state machine used in the XMLReader to track document location; note how even for this simple XML document this code is not trivial. Comparatively, the code for outputting XML documents using the XMLWriter is very simple.Listing 10.3. Using the Forward-Only XML Reader/Writers to Save and Load XML Data from a File
using System; public class SaveAndLoadXML_UseReaderWriter { //XML Tags we will use in our document const string XML_ROOT_TAG = "AllMyData"; const string XML_USERINFO_TAG = "UserInfo"; const string XML_USERID_TAG = "UserID"; const string XML_NAMEINFO_TAG = "Name"; const string XML_FIRSTNAME_TAG = "FirstName"; const string XML_LASTNAME_TAG = "LastName"; //The set of states we are tracking as we read in data private enum ReadLocation { inAllMyData, inUserInfo, inUserID, inName, inFirstName, inLastName, } //----------------------------------------------------------- //Saves the state of the user // // [in] fileName: The name of the file we are saving to // [in] userId: UserID we have loaded // [in] firstName: User's FirstName we have loaded // [in] lastName: User's LastName we have loaded //----------------------------------------------------------- public static void XML_SaveUserInfo(string fileName, int userId, string firstName, string lastName) { System.Xml.XmlTextWriter xmlTextWriter; xmlTextWriter = new System.Xml.XmlTextWriter(fileName, System.Text.Encoding.Default); //Write out the contents of the document! //<Root> xmlTextWriter.WriteStartElement(XML_ROOT_TAG); //<Root> xmlTextWriter.WriteStartElement(XML_USERINFO_TAG); //<Root><UserID> //<Root><UserInfo> xmlTextWriter.WriteStartElement(XML_NAMEINFO_TAG); //<Root><UserInfo><Name> xmlTextWriter.WriteStartElement(XML_FIRSTNAME_TAG); //<Root><UserInfo><Name><FirstName> xmlTextWriter.WriteString(firstName); //Value being written xmlTextWriter.WriteEndElement(); //Close first name //<Root><UserInfo><Name> xmlTextWriter.WriteStartElement(XML_LASTNAME_TAG); //<Root><UserInfo><Name><LastName> xmlTextWriter.WriteString(lastName); //Value being written xmlTextWriter.WriteEndElement(); //Close last name //<Root><UserInfo><Name> xmlTextWriter.WriteEndElement(); //Close Name //<Root><UserInfo> //<Root><UserInfo> xmlTextWriter.WriteStartElement(XML_USERID_TAG); //<Root><UserInfo><UserID> //Value being written xmlTextWriter.WriteString(userId.ToString()); xmlTextWriter.WriteEndElement(); //Close UserID //<Root><UserInfo> xmlTextWriter.WriteEndElement(); //Close UserInfo //<Root> xmlTextWriter.WriteEndElement(); //Close Document // xmlTextWriter.Close(); } //---------------------------------------------------------- //Loads the state of the user // // [in] fileName: The name of the file we are saving to // [out] userId: UserID we have loaded // [out] firstName: User's FirstName we have loaded // [out] lastName: User's LastName we have loaded //---------------------------------------------------------- public static void XML_LoadUserInfo(string fileName, out int userId, out string firstName, out string lastName) { ReadLocation currentReadLocation; //Start out with null values userId = 0; firstName = "; lastName = "; System.Xml.XmlTextReader xmlReader = new System.Xml.XmlTextReader(fileName); xmlReader.WhitespaceHandling = System.Xml.WhitespaceHandling.None; bool readSuccess; readSuccess = xmlReader.Read(); if(readSuccess == false) { throw new System.Exception("No XML data to read!"); } //Make sure we recognize the root tag. if(xmlReader.Name != XML_ROOT_TAG) { throw new System.Exception( "Root tag different from expected!"); } //Note where we are in the document currentReadLocation = ReadLocation.inAllMyData; //----------------------------------------------- //Loop through our document and read what we need //----------------------------------------------- while(readSuccess) { switch(xmlReader.NodeType) { //Called when we enter a new Element case System.Xml.XmlNodeType.Element: { string nodeName = xmlReader.Name; LoadHelper_NewElementEncountered(nodeName, ref currentReadLocation); break; } //-------------------------------------------------- //Here's where we can actually extract some text and //get the data we are trying to load //-------------------------------------------------- case System.Xml.XmlNodeType.Text: { switch(currentReadLocation) { case ReadLocation.inFirstName: { firstName = xmlReader.Value; break; } case ReadLocation.inLastName: { lastName = xmlReader.Value;break; } case ReadLocation.inUserID: { userId = System.Convert.ToInt32(xmlReader.Value); break; } } break; } //--------------------------------------------------- //Gets called when we have encountered the end of //an element // //We may want to switch our state based on what node //we are exiting to indicate that we are going back //to that node's parent //--------------------------------------------------- case System.Xml.XmlNodeType.EndElement: { bool continueParsing; continueParsing = LoadHelper_EndElementEncountered( ref currentReadLocation); if(continueParsing == false) { goto finished_reading_wanted_data; } break; } default: { //There is no harm in having other XML node types, but //using our sample XML, we should note this occurrence //code... System.Windows.Forms.MessageBox.Show( "Unexpected XML type encountered" + xmlReader.Name); break; } } //End of Case statement-based current type of XML //element the parser is on. //Go to the next node readSuccess = xmlReader.Read(); } //If we made it to this point without exiting the UserInfo //XML tag, something went wrong with the XML data we were //reading. throw new Exception("Could not find UserInfo in XML!"); finished_reading_wanted_data: //Close the file, we're done with it! xmlReader.Close(); } //------------------------------------------------------- //Helper logic that decides what state we should enter //when we encounter an exit tag. //------------------------------------------------------- private static bool LoadHelper_EndElementEncountered( ref ReadLocation currentReadLocation) { switch(currentReadLocation) { //If we are leaving the Name node, we are going back //up to the UserInfo case ReadLocation.inName: { currentReadLocation = ReadLocation.inUserInfo; break; } //If we are leaving the FirstName node, we are going //back up to the Name node case ReadLocation.inFirstName: { currentReadLocation = ReadLocation.inName; break; } //If we are leaving the LastName node, we are going back //up to the Name node case ReadLocation.inLastName: { currentReadLocation = ReadLocation.inName; break; } //If we are leaving the UserID node, we are going back //up to the UserInfo node case ReadLocation.inUserID: { currentReadLocation = ReadLocation.inUserInfo; break; } //If we are leaving the UserInfo node, we have just //finished reading in the UserID, FirstName, //and LastName. // //We can exit the loop, as we have all the information //we want! case ReadLocation.inUserInfo: { return false; //We should stop parsing } } return true; //Continue parsing } private static void LoadHelper_NewElementEncountered( string nodeName, ref ReadLocation currentReadLocation) { //--------------------------------------------------------- //We have entered a new element! // //What state we can enter is dependent on what state we are //presently in //--------------------------------------------------------- switch (currentReadLocation) { //If we're in the "AllMyData" Node, here are the nodes //we can enter case (ReadLocation.inAllMyData): { if (nodeName == XML_USERINFO_TAG) { currentReadLocation = ReadLocation.inUserInfo; } break; } //If we're in the UserInfo node, here are the nodes //we can enter case (ReadLocation.inUserInfo): { if (nodeName == XML_USERID_TAG) { currentReadLocation = ReadLocation.inUserID; } else if (nodeName == XML_NAMEINFO_TAG) { currentReadLocation = ReadLocation.inName; } break; } //If we're in the Name node, here are the nodes //we can enter case (ReadLocation.inName): { if (nodeName == XML_FIRSTNAME_TAG) { currentReadLocation = ReadLocation.inFirstName; } else if (nodeName == XML_LASTNAME_TAG) { currentReadLocation = ReadLocation.inLastName; } break; } } } //End function } //End Class
|