Best Practices
The following are some best practices to follow when employing DTDs and XML schemas.
Designing Your Schema
Defining a schema from scratch can be a daunting task. The following are a set of guidelines that will make your job easier, reduce errors, and ease maintenance.
Name the XML Schema document file to reflect the root element.
Ensure the root element is defined as the first top-level element and based on the rootType.
Ensure XML Schemas are versioned using the optional version attribute in the <schema/> element.
Use xsd:token instead of xsd:string because the lexical space of a token is the set of strings that do not contain the line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The base type of token is normalizedString.
XML element and attribute names should be entirely upper case and use the underscore to separate word boundaries. This makes the database objects easier to create and use when registering the XML schema to Oracle XML DB.
Elements vs. Attributes
A classic question that every XML schema and DTD designer faces at some point is this: Shall I design my DTD or XML schema modeling the data as elements or attributes? It is also a very crucial question as its answer can be the difference between a successful future-proof design and one that fails. To illustrate this, consider the following two fragments describing an instance of an address. The first is attribute-based:
<Address Street="123 Main St.
" City="San Francisco" State="CA" Zip="94127"/>
Notice how it reads very nicely and appears to be a compact representation because there are no end tags. Now consider the following element-based instance:
<Address>
<Street>123 Main St.</Street>
<City>San Francisco</City>
<State>CA<State>
<Zip>94127</Zipcode>
</Address>
While this version does not read left to right as conveniently, its DOM representation is not significantly larger than the first one because both elements and attributes are nodes. More importantly, if at a later time you want to add structure to any one of the elements, you can with the second one but not the first, because attributes cannot have structure or be more than a simple type. An example would be where you want to support extended ZIP codes:
<Zip>
<Code>94127</Code>
<Ext>8522</Ext>
</Zip>
Designing Element and Attribute Names
XML has the characteristic of being human-readable with no effective limit to the number of characters used in element and attribute names. The temptation is therefore to create very explicit names, the result being an instance document that is many times larger than the data it is conveying, as shown in this XML fragment:
<ArrivalInfo>
<TripCityTimeInfo>
<CodeDescription>
<Code>CVG</Code>
<Description>CVG - Cincinnati,OH, United States - Northern Kentucky
Intl</Description>
<AdditionalData>Northern Kentucky Intl</AdditionalData>
</CodeDescription>
<Date>
<Month>5</Month>
<Day>21</Day>
<Year>2002</Year>
</Date>
<Time Format="Military">0759</Time>
<FlightSearchByTimeType Type="Arriving"/>
</TripCityTimeInfo>
</ArrivalInfo>
This actual XML fragment contains 471 bytes yet only 114 bytes of data. This does add significantly to the processing costs in both resources and time. Remember, ultimately XML is a machine-processed document; therefore, it is important for names to be reasonable.
Loading External DTDs from a JAR File
A very convenient way to handle multiple DTDs is to put them in a JAR file, so that when the XML parser needs one of the DTDs, it can access it from the JAR. The Oracle XML parser supports a base URL (setBaseURL()), but that just points to a place where all the DTDs are exposed. The solution involves the following steps:
Load the DTD as an InputStream using
InputStream is = YourClass.class.getResourceAsStream("/foo/bar/your.dtd");
This opens ./foo/bar/your.dtd in the relative location on the CLASSPATH so that it can be found in your JAR.
Parse the DTD with the following code:
DOMParser d = new DOMParser();
d.parseDTD(is, "rootelementname");
d.setDoctype(d.getDoctype());
Parse your document with the following code:
d.parse("yourdoc");