Valid XML Documents
A valid XML document is one that conforms to either a specified DTD or XML Schema, meaning that the elements, attributes, structural relationships, and sequences in the XML document are the same as the ones specified in the DTD or XML Schema. For example, the following XML is valid with respect to the DTD, which follows it:<bookcatalog>
<book>
<title>History of Interviews</title>
<author>
<firstname>Juan</firstname>
<lastname>Smith</lastname>
</author>
<ISBN>99999-99999</ISBN>
<publisher>Oracle Press</publisher>
<publishyear>2003</publishyear>
<price type="US">10.00</price>
</book>
</bookcatalog>
The following is the DTD to which the XML document conforms:
<!-- DTD bookcatalog may have a number of book entries -->
<!DOCTYPE bookcatalog [
<!ELEMENT bookcatalog (book)*>
<!-- Each book element has a title, 1 or more authors, etc. -->
<!ELEMENT book (title, author+, ISBN, publisher, publishyear, price)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (firstname, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT publishyear (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ATTLIST price type (US|CAN|UK|EURO) #REQUIRED>
]>
The DOCTYPE declaration of the DTD specifies the root element—in this case, the <bookcatalog> element. An element simply consists of a start tag, for example, <title>; all of the text in between, History of Interviews; and the corresponding end tag, for example, </title>. Only one root element, however, may exist within an XML document. The root element marks the beginning of the document and is considered the parent of all the other elements, which are nested within its start tag and end tag. For XML documents to be considered valid with respect to this DTD, the root element bookcatalog must be the first element to start off the body of the XML document.
Following this are the element declarations, which stipulate the child elements that must be nested within the root element bookcatalog, the content model for the root element. Note that all the child elements of bookcatalog are explicitly called out in its element declaration, and that author has a + as a suffix. This is an example of the Extended Backus-Naur Format (EBNF) that can be used to describe the content model. The allowed suffixes are? For 0 or 1 occurrence* For 0 or more occurrences+ For 1 or more occurrencesNo suffix means 1 and only 1.Note also the use of #PCDATA to declare that the element text must not be marked-up text, and that price’s required attribute values are explicitly declared. The difference between CDATA and PCDATA is that CDATA sections are simply skipped by the parser and aren’t checked for well-formedness; hence, they can be viewed as “non-parsed character data.”Thus a validating XML parser, by parsing the XML document according to the rules specified in this DTD, tries to determine whether the document conforms to the DTD (is valid), meaning that all the required elements, attributes, structural relationships, and sequences are as declared.