XML versus the Database
Nope, this isn''t tonight''s featured bout at the World Wresting Federation. It''s a discussion stemming from one very basic question: What truly is the difference between storing information as XML and storing it in a database?
In the broad sense, there isn''t much. After all, both are technologies designed to impose structure on raw data, thereby making it more useful to the applications that need it. XML accomplishes this with elements and attributes, a database with rows and columns. Both also come with some fairly powerful toolsdatabases with Structured Query Language (SQL) or some variant thereof, XML with the XML Path Language (XPath), and Extensible Stylesheet Language Transformations (XSLT)that facilitate the manipulation of the data contained within them.
In a narrower, more technical sense, however, differences between the two technologies do exist. A database is designed for highly structured data; XML, on the other hand, favors data that is less structured and more hierarchical in nature. This is a fairly fundamental difference; although others do exist (and are discussed in greater length in the following sections), the consequences of ignoring this one can prove to be fairly serious.
In the web application arena, developers tend to prefer a traditional RDBMS over XML; too often, this choice is influenced less by technical reasons than by a betterthe-devil-you-know psychosis. As you might imagine, this is not a good thing; in a production environment in which performance, portability, and ease of use are paramount, a developer needs to make implementation decisions based on sound technical reasons and a fundamental understanding of the issues and the technologies involved.
To that end, the following sections list the factors to be kept in mind when deciding between XML and a database.
Nature of the Data
If you have highly structured or volatile datafor example, stock symbols and their corresponding prices, or an address book with specific fields for contact information a database is a far better choice than XML. This is because databases are designed to handle structured data (that''s the whole point of those rigid rows and columns), and can offer a (much!) higher level of performance when manipulating this type of data.
When it comes to unstructured or irregularly structured information (recipes, poetry, lengthy dissertations, and so on), though, a traditional database tends to run into difficulties, because this type of data tends to resist easy classification into rows and columns. In such cases, XML is a much better choice; its self-describing approach offers document authors greater flexibility, allowing them to easily describe relationships between data fragments and thereby create data models that are efficient without necessarily conforming to the two-dimensional table paradigm.
That said, XML is not a good choice for volatile and frequently updated information simply because there exists no easy way to manipulate XML-encoded documents. It''s far easier to put this type of information into a database and manipulate it with SQL than it is to store it in a text file and modify it with file read/write functions.
Data Retrieval Requirements
When deciding on whether to use XML or a database for your data, it''s also important to take note of your data-retrieval requirements. If you''re planning to work only with data organized in a clearly defined manner, you''re fairly safe using a traditional database. An example of this is the retrieval of a name and telephone number from a table containing contact information. Because the required dataset is clearly defined and already maps into specific fields of a table, retrieving it via SQL is a snap.
If, however, the data you need to retrieve isn''t so clearly defined, with the required dataset blurring or extending across the clearly defined boundaries of a table, the XML family of technologies is a far better choice. An example of this might be the retrieval of every alternate stanza in a poemhard to do with SQL, yet fairly easy with XPath.
XPath, together with XSLT and XPointer, makes it extremely easy to create customized node collections from an XML document tree, and restructure this node collection into the format you require. Further, if your data contains complex recursive or nested structures, a traditional database will have difficulty organizing it in a format suitable for easy retrieval; XML, on the other hand, is well-suited to this task.
Data organized in a traditional database, which typically has built-in support for indexing and stored procedures, can usually be accessed faster than the corresponding XML-encoded version. Because XML documents are, at their core, ordinary text files, accessing individual data fragments involves parsing the text file line-by-line; this can degrade performance substantially, especially if there is a large amount of data involved.
A database also tends to handle simultaneous access better than an XML document, implementing a more secure and robust locking arrangement than is possible with a "regular" flat file.
There''s a flip side to the performance argument, though. Because XML markup is stored as text, XML data is immediately portable across platforms; databases, on the other hand, may store data in proprietary formats that cannot be easily ported to other systems and platforms. If portability is a concern, or if you anticipate data manipulation by non-technical users, you can''t get any more portable or simple than ASCII text; in these situations, it''s worthwhile to consider whether XML might be a better solution than a database.
Standardization and Integration with Other Applications
Most databases support SQL, which provides a simple and efficient method of accessing records in a database. An upcoming W3C technology named XML Query is supposed to do for XML documents what SQL does for databases; at the moment, though, it''s still under development. Consequently, most developers still need to write their own tools to access XML-encoded data, using either a SAX or DOM parser; this can add to development time and (because there isn''t a common standard) possible integration difficulties with applications from other vendors.
Encoding your data into XML can also cause problems if it needs to be shared with applications or systems that don''t understand the language. In such situations, too, a traditional database system (that exposes a standard API to developers) would be preferable.
It should be noted that there''s no "one-size-fits-all" answer to the question of deciding which technology is most appropriate; the factors involved are fairly complex, and they vary from situation to situation. As technology evolves, the question may even become mootmany commercial databases now support XML as a data type; and a new database hybrid, the native XML database, offers the best of both worlds, combining the simplicity of XML with the feature set of traditional database systems.
Knowledge Is Power
Interested in learning more about XML, databases, and XML databases? Drop by the companion web site for this book for links to articles on the topic. Go to