An XML document must follow the specific standards laid down by W3C in order to be acceptable – in particular, it must be well-formed. It must:
Have a single root element that encloses all other content except the document declaration, processing instructions, and comments.
Have matching closing tags for all the opening tags (or use the shorthand syntax of ending the element with a forward slash character).
Be properly nested so that elements are fully enclosed. You can't open an element as a child of another element and then close the parent element before closing the child element.
Contain only valid characters. All non-valid content must be escaped or replaced by the correct entity equivalents, such as
& for an ampersand character.
An XML document can be well-formed and still not be valid. The validity of a document is defined using a schema or Document Type Definition (DTD). This lays out the structure of the elements, attributes and other content, the ordering of the elements, and the permissible value ranges for the elements and attributes. The XML storage objects parse the XML to ensure that it's well-formed when they load it (it can't be loaded otherwise), but don't automatically validate the XML. You have to look after that yourself.
Documents are validated against a given XML schema or DTD using the
XmlValidatingReader object. This isn't actually a reader, but a helper object that is attached to a reader. Figure 11-16 shows the way it works. The document is read using an
XmlTextReader (or an
XmlNodeReader , if you only want to validate part of a document). This object automatically raises an error if the document is not well- formed.
Figure 11-16:
When you attach an
XmlValidatingReader to the
XmlTextReader , it automatically checks for the presence of a schema or DTD within the document, and validates the content of the document against that schema or DTD. Errors found during validation are raised through the
Validation event, and the handler for this event receives a
ValidationEventHandler object that contains a description of the error. You can access this object's properties when the event occurs to determine the validation errors that are present (you'll see how this is done shortly in the example page), or you can leave it to the default event handler to raise an error.
Creating an
XmlValidatingReader for use with a document that contains an inline schema or DTD (or which specifies an external schema or DTD) is easy. You just need to create the
XmlTextReader , specifying the XML document to load, and then use this as the basis for creating the
XmlValidatingReader . Afterwards, you can set the
ValidationType property to specify the type of schema you're using:
'create the new XmlTextReader object and load the XML document
objXTReader = New XmlTextReader(strXMLPath)
'create an XmlValidatingReader for this XmlTextReader
Dim objValidator As New XmlValidatingReader(objXTReader)
'set the validation type to use an XML Schema
The acceptable values for the
ValidationType property are shown in the following table:
Value |
Description |
---|---|
Auto |
The default. Validation is automatically performed against whichever type of schema or DTD is encountered. |
DTD |
Validate against a DTD. This actually creates an XML1.0-compliant parser. Default attributes are reported and general entities can be resolved by calling the ResolveEntity method. The DOCTYPE is not used for validation purposes. |
Schema |
Validate against a W3C-compliant XML Schema (XSD), including an inline schema. Schemas are specified using the schemaLocation attribute. |
XDR |
Validate against a schema that uses Microsoft's XML Data Reduced (XDR) syntax, including an inline schema. XDR schemas use the " x-schema " namespace prefix or the Schemas property. |
None |
No validation is performed. Can be used to "switch off" validation when not required. |
Figure 11-16 also shows how to use a separate (not inline or linked) schema or DTD to validate the document. And, as schemas can inherit from each other, there could be several schemas that you'd want to apply to the XML document (thought there can only be one DTD). To cope with this, the
XmlValidatingReader exposes a reference to an
XmlSchemaCollection through the
Schemas property. This collection contains all the desired schemas.
If you are familiar with using the MSXML parser in ASP 3.0 or other environments, you may expect to be able to validate a document when you load it simply by setting some property. For example, with MSXML, the
ValidateOnParse property can be set to
True to validate a document that contains an inline schema or DTD, or a reference to an external schema or DTD.
However, things are different when using the .NET
System.Xml classes. Loading a combined schema or DTD and the XML data content (that is, an inline schema) or an XML document that references an external schema or DTD into any of the XML storage objects such as
XmlDocument ,
XmlDataDocument , and
XPathDocument does not automatically validate that document. And there is no property that you can set to make it do this.
Instead, you can load the document via an
XmlTextReader object to which you have attached an
XmlValidatingReader . The
Load method of the
XmlDocument and
XmlDataDocument objects can accept an
XmlValidatingReader as the single parameter instead of a file path and name. Meanwhile, the constructor for the
XPathDocument object can accept an
XmlValdiatingReader as the single parameter.
So all you have to do is set up the
XmlValidatingReader and
XmlTextReader combination, and pass this to the
Load method or the constructor function (depending on which document object you're creating). The document will then be validated as it is loaded:
'create XmlTextReader, load XML document and create Validator
objXTReader = New XmlTextReader(strXMLPath)
Dim objValidator As New XmlValidatingReader(objXTReader)
objValidator.ValidationType = ValidationType.Schema
'use the validator/reader combination to create XPathDocument object
Dim objXPathDoc As New XPathDocument(objValidator)
'use the validator/reader combination to create XmlDocument object
Dim objXmlDoc As New XmlDocument()
objXmlDoc.Load(objValidator)
The
XmlValidatingReader can also be used to validate XML held in a
String . So, you can validate XML that's already loaded into an object or application by simply extracting it as a
String object (using the
GetXml method with a
DataSet object, or the
OuterXml property to get a document fragment, for example) and applying the
XmlValidatingReader to this.
Like the XML document objects, a
DataSet does not automatically validate XML that's provided for the
ReadXml method against any schema that is already in place within the
DataSet or which is inline with the XML (that is, in the same document as the XML data content). In a
DataSet , the schema is used solely to provide information about the intended structure of the data. It's not used for actual validation at all.
When you load the schema, the
DataSet uses it as a specification for the table names, column names, data types, and so on. Then, when you load the XML data content, it arranges the data in the appropriate tables and columns as new data rows. An encountered value or element that doesn't match the schema is ignored, and that particular column in the current data row is left empty.
This makes sense, because the
DataSet is designed to work with structured relational data, and so any superfluous content in the source file cannot be part of the correct data model. So, you should think of schemas in a
DataSet as being a way to specify the data structure (rather than inferring the structure from the data, as happens if no schema is present). Don't think of this as a way of validating the data.
The Validating XML documents with an XmlValidatingReader object (
validating-xml.aspx ) example page shown in Figure 11-17 demonstrates how you can validate an XML document. When first opened, it displays a list of source documents that you can use in a drop-down list, and it performs validation against the selected document. As you can see from the screenshot, it reports no validation errors in a valid document.
Figure 11-17:
Note |
You must run the page in a browser on the web server itself to be able to open the XML document and schema using the physical paths in the hyperlinks in the page. |
However, if you select the well-formed but invalid document, it reports a series of validation errors, as shown in Figure 11-18:
Figure 11-18:
In this case the XML document contains an extra
<MiddleInitial> child element within one of the
<Books> elements, which is not permitted in the schema that's being used to validate it.
The following code shows the offending element. You can view the document and the schema using the hyperlinks provided in the page:
<Books>
<ISBN>0764544020</ISBN>
<Title>Beginning Access 2002 VBA</Title>
<PublicationDate>2000-04-01T00:00:00.0000000+01:00</PublicationDate>
<FirstName>Mark</FirstName>
<MiddleInitial>J</MiddleInitial>
<LastName>Horner</LastName>
</Books>
The code that follows performs the validation. We start by creating the paths to the schema and XML document. In this example, the document name comes from the
selXMLFile drop-down list defined earlier in the page – the filename itself is the
value attribute of the selected item.
We then declare a variable to hold the number of validation errors found. This is followed by code to create an
XmlTextReader object, specifying the XML document as the source. Also provided is a hyperlink to this document:
'create physical path to sample files (in same folder as ASPX page)
Dim strCurrentPath As String = Request.PhysicalPath
Dim strXMLPath As String = Left(strCurrentPath, _
InStrRev(strCurrentPath, "\")) & selXMLFile.SelectedItem.Value
Dim strSchemaPath As String = Left(strCurrentPath, _
InStrRev(strCurrentPath, "\")) & "booklist-schema.xsd"
'variable to count number of validation errors found
Dim intValidErrors As Integer = 0
'create the new XmlTextReader object and load the XML document
objXTReader = New XmlTextReader(strXMLPath)
outXMLDoc.innerHTML = "Loaded file: <a href="" & strXMLPath _
& "">" & strXMLPath & "</a><br />"
The next step is to create the
XmlValidatingReader object with the
XmlTextReader as the source, and specify the validation type to suit the schema (you could have, of course, used
Auto to automatically validate against any type of schema or DTD).
The schema is in a separate document and there is no link or reference to it in the XML document. So it's necessary to specify which schema to use. You can create a new
XmlSchemaCollection , and add the schema to it using the
Add method of the
XmlSchemaCollection . You then specify this collection as the
Schemas property, and display a link to the schema:
'create an XmlValidatingReader for this XmlTextReader
Dim objValidator As New XmlValidatingReader(objXTReader)
'set the validation type to use an XSD schema
objValidator.ValidationType = ValidationType.Schema
'create a new XmlSchemaCollection
Dim objSchemaCol As New XmlSchemaCollection()
'add the booklist-schema.xsd schema to it
objSchemaCol.Add(", strSchemaPath)
'assign the schema collection to the XmlValidatingReader
objValidator.Schemas.Add(objSchemaCol)
outXMLDoc.innerHTML &= "Validating against: <a href="" _
& strSchemaPath & "">" & strSchemaPath & "</a>"
Note |
In version 1.1, Microsoft has suggested an updated approach to loading stylesheets that are not fully trusted. See the Loading Stylesheets and Schemas with an XmlResolver section at the end of this chapter for details. |
The
XmlValidatingReader will raise an event whenever it encounters a validation error in the document, as the
XmlTextReader reads it from the disk file. If you don't handle this event specifically, it will be raised to the default error handler. In our case, this is the
Try...Catch construct included in the example page.
However, it's often better to handle the validation events separately from other (usually fatal) errors such as the XML file not actually existing on disk. To specify your own event handler for the
ValidationEventHandler event in Visual Basic, use the
AddHandler method, and pass to it the event you want to handle and a pointer to the handler routine (named
ValidationError in this example):
'add the event handler for any validation errors found
AddHandler objValidator.ValidationEventHandler, AddressOf ValidationError
In C#, you can add the validation event handler using the following syntax:
objValidator.ValidationEventHandler += new
ValidationEventHandler(ValidationError);
You are now ready to read the XML document from the disk file. In this case, you're only reading through to check for validation errors. In an application, you would have code here to perform whatever tasks you need against the XML, or alternatively use the
XmlValidatingReader as the source for the
Load method of an
XmlDocument or
XmlDataDocument object, or in the constructor for an
XPathDocument object.
Once validation is complete, you can display a count of the number of errors found and close the reader object to release the disk file. If the document is not well-formed or cannot be loaded for any other reason (such as it doesn't exist), a parser error occurs. In this case, you can include a statement in the
Catch section that displays the error. That's all you need to do to validate the document:
Try
'iterate through the document reading and validating each element
While objValidator.Read()
'use or display the XML content here as required
End While
'display count of errors found
outXMLDoc.innerHTML &= "Validation complete " & intValidErrors _
& " error(s) found"
Catch objError As Exception
'will occur if there is a read error or the document cannot be parsed
outXMLDoc.InnerHTML &= "Read/Parser error: " & objError.Message
Finally
'must remember to always close the XmlTextReader after use
objXTReader.Close()
End Try
The
XmlValidatingReader raises the
Validation event whenever a validation error is discovered in the XML document, and it's been specified that the
ValidationError event handler will be called when this event is raised. This event handler receives the usual reference to the object that raised the event, plus a
ValidationEventArgs object containing information about the event.
In the event handler, we first increment the error counter, and then check what kind of error it is by using the
Severity property of the
ValidationEventArgs object. A displayed message describes the error, the line number, and character position if available (although these are generally included in the error message anyway):
Public Sub ValidationError(objSender As Object, _
objArgs As ValidationEventArgs)
'event handler called when a validation error is found
intValidErrors += 1 'increment count of errors
'check the severity of the error
Dim strSeverity As String
If objArgs.Severity = 0 Then strSeverity = "Error"
If objArgs.Severity = 1 Then strSeverity = "Warning"
'display a message
outXMLDoc.InnerHTML &= "Validation error: " & objArgs.Message _
& "<br /> Severity level: '" & strSeverity
If objXTReader.LineNumber > 0 Then
outXMLDoc.InnerHTML &= "Line: " & objXTReader.LineNumber _
& ", character: " & objXTReader.LinePosition
End If
End Sub
The previous screenshot displayed validation error messages caused by a well-formed but invalid document. We've also provided an XML document that is not well-formed, so that you can see the parser error that is raised and trapped by the
Try...Catch construct. This also prevents the remainder of the document from being read, as shown in Figure 11-19:
Figure 11-19:
In this case, there is an illegal closing tag for one of the
<Books> elements. One of the options provided even tries to load a non-existent XML document, so you can see that the page traps this error successfully as well.
<Books>
<ISBN>1861003382</ISBN>
<Title>Beginning Active Server Pages 3.0</Title>
<PublicationDate>1999-12-01T00:00:00</PublicationDate>
<FirstName>David</FirstName>
<LastName>Sussman</LastName>
</BBoooks>