Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

Mark V. Scardina, Ben ChangandJinyu Wang

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








Hypertext Markup Language (HTML), created in 1990, is based. While SGML is still a widely used standard in the document world, and HTML is still widely used as the basis of millions of web pages on the World Wide Web, XML is rapidly gaining widespread acceptance because of its advantages in data exchange, storage, and description over the existing markup languages. Since the publication of its v1.0 specifications by the W3C in February 1998, XML has been widely seen as the language and data interchange of choice for e-commerce.



What Is an XML Document?



While this book is not meant to be a full XML tutorial, as with any standard, numerous concepts and technical terms need to be explained. Because XML was developed to convey data, a relevant example is a data record of a book listing from a standard database. A complex SQL query could return data in the following format:


History of Interviews, Juan, Smith, 99999-99999, Oracle Press, 2003.


If XML is used as the output form, however, this record now has additional context for each piece of data, as evidenced in the following:


<book>
<title>History of Interviews</title>
<author>
<firstname>Juan</firstname>
<lastname>Smith</lastname>
</author>
<ISBN>99999-99999</ISBN>
<publisher>Oracle Press</publisher>
<publishyear>2003</publishyear>
<price type="US">10.00</price>
</book>


Certain items of note in this example are explored in detail later. Notice that the file has symmetry, and each piece of data has its context enclosing it in the form <context></context>. The angle brackets and text inside are called tags, and each set of tags and its enclosed data is called an element. This relationship can be thought of as similar to a column in a database table in which the text of the tag is the column heading and the text between the tags is the data from a row in that column. In the preceding example, title could be the name of the column and History of Interviews could be the data in a row.


Notice, too, that several tags contain tags instead of data. This is a significant feature of XML, which permits nesting of data to define relationships better. Returning to the database metaphor, the <author> tag could be modeled as a table whose columns were <firstname> and <lastname>. In XML terminology, these column tags are referred to as children of the parent <author> tag.


Now look at the <price> tag and you see that it includes text of the form name=“value”. These name-value pairs are called attributes, and one or more of these can be included in the start tag of any element. Attributes, however, are not legal in end tags (for example, </tag name=“foo”>). Notice that attribute values must be framed by quotes (single or double, as long as the closing and opening quotes are the same) as specified by SGML. HTML is much more permissive in this area.


One final terminology note: the entire XML example is enclosed by <book></book>. These tags are defined as the root of the document, and only one may exist in any particular document. XML documents that follow these rules of having only one root and properly closing all open tags are considered well formed.


XML’s basic concepts and terminology are straightforward and are formalized in an open Internet standard. As the W3C XML 1.0 specification states, “XML documents are made up of storage units called entities, which contain either parsed data or unparsed data. Parsed data [or PCDATA] is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document’s storage layout and logical structure.” XML documents have both physical and logical structure. The physical structure of the XML document simply refers to the XML file and the other files that it may import, whereas the logical structure of an XML document refers to the prolog and the body of the document.


The XML of the book example represents the body of an XML document, but it is missing important information that helps identify its nature. This information is in the prolog, discussed in the following section.



The Prolog



The prolog consists of the XML declaration (that is, the version number), a possible language encoding hint, other attributes (name-value pairs), and an optional grammar or data model specified by either an XML Schema Definition (XSD) or a Document Type Definition (DTD) referred to by a URL. The prolog may also contain the actual XSD or DTD. An example with a reference to an external DTD would look like the following:


<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE book SYSTEM "book.dtd">


Note that a line containing <? … ?> is an example of an XML processing instruction (PI). In this example, xml is the name of the XML PI. In addition, the character set encoding supported in the example is a compressed version of Unicode called UTF-8. While XML processors usually detect the encoding from the first 3 bytes in the file, this declaration can be used as a hint to indicate the expected encoding. Finally, the standalone attribute refers to whether the processor needs to include or import other external files.


The second line of this prolog refers to a DOCTYPE. This is where the declaration of the grammar or data model for this XML document is done. Why is this important? Remember, an XML file has both physical and logical representations. In some applications, it may be sufficient to process the XML without knowing whether information is missing, but most of the time, an application wants to validate the XML document it receives to confirm everything is there. To do this, the application must know which elements are required, which ones can have children, which ones can have attributes, and so forth. In XML terms, the grammar or data model in this example is referred to as DTD. This DTD can reside within the XML file itself or simply be referred to so that the processor can locate it, as in this example.


The preceding example might look as follows with an XML Schema declaration:


<?xml version="1.0"?>
<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema
xmlns:bk="http://www.mypublishsite.com/books">


To begin with, note that the XML Schema declaration has a prefix xsd:, which is associated with the XML Schema namespace through the declaration xmlns:xsd=“http://www.w3.org/ 2001/XMLSchema”. This prefix is used on the names of the data types defined in the referenced XSD to differentiate them from others using the same name. The xsd:schema declaration denotes the beginning of this XML Schema incorporated in this XML document, along with one other declaration, xmlns:bk=“http://www.mypublishsite.com/books”, which defines the namespace of the prefix bk: so as to identify these types as defined by the author of this data model.


Note also that the schema declaration is within the <book> tag instead of in the prolog. This is a distinct difference between XSDs and DTDs. Thus, the XML schema declaration is an attribute of the root element of the document and is part of the body, which we discuss next.



The Body



The root element, which contains the remainder of the XML document, follows the prolog and is called the body of the XML document. This part is composed of elements, processing instructions, content, attributes, comments, entity references, and so forth. As previously mentioned, elements must have start tags and corresponding end tags nested in the correct order; otherwise, the XML document is not well-formed, and XML parsers may signal errors because of this. Elements can also have attributes, or name-value pairs, such as <author firstname=“Juan” lastname=“Smith”>. Built-in attributes defined by the XML 1.0 specification also exist, such as xml:space=“preserve” to indicate that the whitespace between the elements be considered as data and thus preserved.


Entity references, defined only in DTDs, are similar to macros in that entities are defined once, and references to them, such as &nameofentity, can be used in place of their entire definitions. For example, in a DTD, <!ENTITY Copyright “Copyright 2000 by Smith, Jones, and Doe – All rights reserved”> could be declared, and then &Copyright could be used as a shortcut throughout the XML document. An XML parser must recognize entities defined in DTDs, even though the validity check may be turned off and an additional XML Schema is specified. Again, built-in entities also exist as defined by the XML 1.0 specifications, such as those for the ampersand, &amp; apostrophe, &apos; less than, &lt; and so forth. Comments are recognized when they are enclosed in the <!-- --> construct.


Within the body of the XML document instance, certain element and attribute names may have prefixes, which are XML namespaces identified by Uniform Resource Identifier (URI) references that qualify the names of these elements and attributes and locate resources that could be on different machines or XML documents. For example, if the declaration xmlns:bk=“http:// www.mypublishsite.com/books” is made in a parent element, the prefix bk:title stands for http:// www.mypublishsite.com/books:title. You can use identical names for either elements or attributes if they are qualified with URIs to differentiate the names. For example, bk:hello is called a qualified name; the namespace prefix bk is mapped to the URI, http://www.mypublishsite.com/books, and the local part is hello. Note that URI references can contain characters not allowed in element names; that is why bk serves as a substitute for the URI. It is important to mention that the bk prefix belongs to the document in which it is declared. Another document declaring the prefix book instead of bk but referencing the same URI would be considered equivalent when parsed by an XML parser.


Finally, the body may contain character data (CDATA) sections to mark off blocks of text that would otherwise be regarded as markup, comments, entity references, processing instructions, and so forth. The CDATA syntax is


<![CDATA[ characters including <, >, /, ?, & not legal anywhere else]]>


These sections are simply skipped by XML parsers as if they were opaque. Later in the book, you will see how you use them to embed SQL statements in XML documents.


Thus, the body of the XML document contains the root element with its schema declarations, child and sibling nodes, elements, attributes, text nodes that represent the textual content of an element or attribute, and CDATA sections.


/ 218