HTML..XHTML.The.Definitive.Guide..5th.Ed.1002002 [Electronic resources] نسخه متنی

6.2 Referencing Documents: The URL

Every document on the Web has a unique address. (Imagine the chaos if
they didn''t.) The document''s
address is known as its uniform resource
locator (URL).[2]

[2] "URL" usually is pronounced
"you are ell," not
"earl."

Several HTML/XHTML tags include a URL attribute value, including
hyperlinks, inline images, and forms. All use the same URL syntax to
specify the location of a web resource, regardless of the type or
content of that resource. That''s why
it''s known as a uniform
resource locator.

Since they can be used to represent almost any resource on the
Internet, URLs come in a variety of flavors. All URLs, however, have
the same top-level syntax:

scheme:scheme_specific_part

The scheme describes the kind of object the URL
references; the scheme_specific_part is, well,
the part that is peculiar to the specific scheme. The important thing
to note is that the scheme is always separated
from the scheme_specific_part by a colon, with
no intervening spaces.

6.2.1 Writing a URL

Write URLs using the displayable characters in the US-ASCII character
set. For example, surely you have heard what has become annoyingly
common on the radio for an announced business web site:
"h, t, t, p, colon, slash, slash, w, w, w, dot,
blah-blah, dot, com." That''s a
simple URL, written:

http://www.blah-blah.com

If you need to use a character in a URL that is not part of this
character set, you must encode the character using a special
notation. The encoding notation replaces the desired character with
three characters: a percent sign and two hexadecimal
digits whose values correspond to the position of the character in
the ASCII character set.

This is easier than it sounds. One of the most common special
characters is the space (owners of older Macintoshes, take special
notice), whose position in the character set is 20 hexadecimal. You
can''t type a space in a URL (well, you can, but it
won''t work). Rather, replace spaces in the URL with
%20:

http://www.kumquat.com/new%20pricingl

This URL actually retrieves a document named new
pricingl from the www.kumquat.com
server.

6.2.1.1 Handling reserved and unsafe characters

In addition to the nonprinting
characters, you''ll need to encode reserved and
unsafe characters in your URLs as well.

Reserved characters are those that have a specific meaning within the
URL itself. For example, the slash character separates elements of a
pathname within a URL. If you need to include in a URL a slash that
is not intended to be an element separator, you''ll
need to encode it as %2F:[3]

[3] Hexadecimal
numbering is based on 16 characters: 0 through 9 followed by A
through F, which in decimal are equivalent to values 0 through 15.
Also, letter case for these extended values is not significant;
"a" (10 decimal) is the same as
"A," for example.

http://www.calculator.com/compute?3%2f4

This URL actually references the resource named
compute on the
www.calculator.com server and passes the string
3/4 to it, as delineated by the
question mark (?).
Presumably, the resource is a server-side program that performs some
arithmetic function on the passed value and returns a result.

Unsafe characters are those that have no special meaning within the
URL but may have a special meaning in the context in which the URL is
written. For example, double quotes
(") delimit URL attribute
values in tags. If you were to include a double quotation mark
directly in a URL, you would probably confuse the browser. Instead,
you should encode the double quotation mark as %22
to avoid any possible conflict.

Other reserved and unsafe characters
that should always be encoded are shown in Table 6-1.

Table 6-1. Reserved and unsafe characters and their URL encodings
Character	Description	Usage	Encoding
;	Semicolon	Reserved	`%3B`
/	Slash	Reserved	`%2F`
?	Question mark	Reserved	`%3F`
:	Colon	Reserved	`%3A`
@	At sign	Reserved	`%40`
=	Equals sign	Reserved	`%3D`
&	Ampersand	Reserved	`%26`
<	Less-than sign	Unsafe	`%3C`
>	Greater-than sign	Unsafe	`%3E`
"	Double quotation mark	Unsafe	`%22`
#	Hash symbol	Unsafe	`%23`
%	Percent	Unsafe	`%25`
{	Left curly brace	Unsafe	`%7B`
}	Right curly brace	Unsafe	`%7D`
\|	Vertical bar	Unsafe	`%7C`
\	Backslash	Unsafe	`%5C`
^	Caret	Unsafe	`%5E`
~	Tilde	Unsafe	`%7E`
[	Left square bracket	Unsafe	`%5B`
]	Right square bracket	Unsafe	`%5D`
`	Back single quotation mark	Unsafe	`%60`

In general, you should always encode a character if there is some
doubt as to whether it can be placed as-is in a URL. As a rule of
thumb, any character other than a letter, number, or any of the
characters $-_.+!*''( ) should be encoded.

It is never an error to encode a character, unless that character has
a specific meaning in the URL. For example, encoding the slashes in
an http URL causes them to be used as regular characters, not as
pathname delimiters, breaking the URL.

6.2.2 Absolute and Relative URLs

You may address a URL in one of two ways: absolute or
relative. An absolute URL is the complete address of a resource and
has everything your system needs to find a document and its server on
the Web. At the very least, an absolute URL contains the scheme and
all required elements of the
scheme_specific_part of the URL. It may also
contain any of the optional portions of the
scheme_specific_part.

With a relative URL, you provide an abbreviated document address
that, when automatically combined with a "base
address" by the system, becomes a complete address
for the document. Within the relative URL, any component of the URL
may be omitted. The browser automatically fills in the missing pieces
of the relative URL using corresponding elements of a base URL. This
base URL is usually the URL of the document containing the relative
URL, but it may be another document specified with the
<base> tag. [<base>]

6.2.2.1 Relative schemes and servers

A common form
of a relative URL is missing the scheme and server name. Since many
related documents are on the same server, it makes sense to omit the
scheme and server name from the relative URL. For instance, assume
the base document was last retrieved from the server
www.kumquat.com. This relative URL:

another-docl

is equivalent to the absolute URL:

http://www.kumquat.com/another-docl

Table 6-2 shows how the base and relative URLs in
this example are combined to form an absolute URL.

Table 6-2. Forming an absolute URL
	Protocol	Server	Directory	File
Base URL	http	www.kumquat.com	/
Relative URL				another-docl

Absolute URL	http	www.kumquat.com	/	another-docl

6.2.2.2 Relative document directories

Another common form of a relative URL omits the leading slash and one
or more directory names from the beginning of the document pathname.
The directory of the base URL is automatically assumed to replace
these missing components. It''s the most common
abbreviation, because most authors place their collections of
documents and subdirectories of support resources in the same
directory path as the home page. For example, you might have a
special subdirectory containing FTP files
referenced in your document. Let''s say that the
absolute URL for that document is:

http://www.kumquat.com/planting/guidel

A relative URL for the file README.txt in the
special subdirectory looks like this:

ftp:special/README.txt

You''ll actually be retrieving:

ftp://www.kumquat.com/planting/special/README.txt

Visually, the operation looks like that in Table 6-3.

Table 6-3. Forming an absolute FTP URL
	Protocol	Server	Directory	File
Base URL	http	www.kumquat.com	/planting	guidel
Relative URL	ftp		special	README.txt

Absolute URL	ftp	www.kumquat.com	/planting/special	README.txt

6.2.2.3 Using relative URLs

Relative URLs are more than just a typing
convenience. Because they are relative to the current server and
directory, you can move an entire set of documents to another
directory or even another server and never have to change a single
relative link. Imagine the difficulties if you had to go into every
source document and change the URL for every link every time you
moved it. You''d loathe using hyperlinks! Use
relative URLs wherever possible.

6.2.3 The http URL

The
http URL is by far the most common. It is used to access documents
from a web server, and it has two formats:

http://server:port/path#fragment
http://server:port/path?search

Some of the parts are optional. In fact, the most common form of the
http URL is simply:

http://server/path

which designates the unique server and the directory path and name of
a document.

6.2.3.1 The http server

The server is the
unique Internet name or Internet protocol (IP) numerical address of
the computer system that stores the web resource. We suspect
you''ll mostly use more easily remembered Internet
names for the servers in your URLs.[4] The name consists of several parts, including the
server''s actual name and the successive names of its
network domain, each part separated by a period. Typical Internet
names look like www.oreilly.com or
hoohoo.ncsa.uiuc.edu.[5]

[4] Each
Internet-connected computer has a unique address a numeric (IP)
address, of course, because computers deal only in numbers. Humans
prefer names, so the Internet folks provide us with a collection of
special servers and software (the Domain Name System, or DNS) that
automatically resolve Internet names into IP addresses.

[5] The
three-letter suffix of the domain name identifies the type of
organization or business that operates that portion of the Internet.
For instance, "com" is a commercial
enterprise, "edu" is an academic
institution, and "gov" identifies a
government-based domain. Outside the United States, a
less-descriptive suffix is often assigned typically a
two-letter abbreviation of the country name, such as
"jp" for Japan and
"de" for Deutschland. Many
organizations around the world now use the generic three-letter
suffixes in place of the more conventional two-letter national
suffixes.

It has become something of a convention that webmasters name their
servers www for quick and easy identification on
the Web. For instance, O''Reilly &
Associates''s web server''s name is
www, which, along with the
publisher''s domain name, becomes the very easily
remembered web site www.oreilly.com. Similarly,
ActivMedia Robotics''s web server is named
www.activmedia.com. Being a nonprofit
organization, the American Kennel Club''s main server
has a different domain suffix: www.akc.org. The naming convention has
very obvious benefits, which you, too, should take advantage of if
you are called upon to create a web server for your organization.

You may also specify the address of a server using its numerical
IP
address. The address is a sequence of four numbers, 0 to 255,
separated by periods. Valid IP addresses look like 137.237.1.87 or
192.249.1.33.

It''d be a dull diversion to tell you now what the
numbers mean or how to derive an IP address from a domain name,
particularly since you''ll rarely, if ever, use one
in a URL. Rather, this is a good place to hyperlink: pick up any good
Internet networking treatise for rigorous detail on IP addressing,
such as Ed Krol''s The Whole Internet
User''s Guide and Catalog
(O''Reilly).

6.2.3.2 The http port

The
port is the number of the communication
port by which the client browser connects to the server.
It''s a networking thing servers perform many
functions besides serving up web documents and resources to client
browsers: electronic mail, FTP document fetches, filesystem sharing,
and so on. Although all that network activity may come into the
server on a single wire, it''s typically divided into
software-managed "ports" for
service-specific communications something analogous to boxes at
your local post office.

The default URL port for web servers is 80. Special secure web
servers Secure HTTP (SHTTP) or Secure Sockets Layer
(SSL) run on port 443. Most web servers today use port 80; you
need to include a port number along with an immediately preceding
colon in your URL if the target server does not
use port 80 for web communication.

When the Web was in its infancy, pioneer webmasters ran their Wild
Wild Web connections on all sorts of port numbers. For technical and
security reasons, system-administrator privileges are required to
install a server on port 80. Lacking such privileges, these
webmasters chose other, more easily accessible, port numbers.

Now that web servers have become acceptable and are under the care
and feeding of responsible administrators, documents being served on
some port other than 80 or 443 should make you wonder if that server
is really on the up and up. Most likely, the maverick server is being
run by a clever user unbeknownst to the server''s
bona fide system administrators.

6.2.3.3 The http path

The document path
is the Unix-style hierarchical location of the file in the
server''s storage system. The pathname consists of
one or more names separated by slashes. All but the last name
represent directories leading down to the document; the last name is
usually that of the document itself.

It has become a convention that for easy identification, HTML
document names end with the suffix l
(otherwise they''re plain ASCII text files,
remember?). Although recent versions of Windows allow longer
suffixes, their users often stick to the three-letter
name suffix for HTML documents.

Although the server name in a URL is not case-sensitive, the document
pathname may be. Since most web servers are run on Unix-based
systems, and Unix filenames are case-sensitive, those document
pathname will be case-sensitive, too. Web servers running on Windows
machines are not case-sensitive, so those document pathnames are not.
Since it is impossible to know the operating system of the server you
are accessing, always assume that the server has case-sensitive
pathnames and take care to get the case correct when typing your
URLs.

Certain conventions regarding the document pathname have arisen. If
the last element of the document path is a directory, not a single
document, the server usually will send back either a listing of the
directory contents or the HTML index document in that directory. You
should end the document name for a directory with a trailing
slash character, but in practice, most
servers will honor the request even if this character is omitted.

If the directory name is just a slash alone, or nothing at all, the
server decides what to serve to your browser typically, a
so-called home page in the root directory stored as a file
named indexl. Every well-designed web server should have an
attractive, well-designed home page; it''s a
shorthand way for users to access your web collection, since they
don''t need to remember the
document''s actual filename, just your
server''s name. That''s why, for
example, you can type http://www.oreilly.com
into Netscape''s
"Open" dialog box and get
O''Reilly''s home page.

Another twist: if the first component of the document path starts
with the tilde character (~), it means that the rest
of the pathname begins from the personal directory in the home
directory of the specified user on the server machine. For instance,
the URL http://www.kumquat.com/~chuck / would
retrieve the top-level page from Chuck''s document
collection.

Different servers have different ways of locating documents within a
user''s home directory. Many search for the documents
in a directory named public_html. Unix-based
servers are fond of the name indexl for home
pages. When all else fails, servers tend to cough up a directory
listing or the first text document in the home page directory.

6.2.3.4 The http document fragment

The fragment is an identifier that points to a
specific section of a document. In URL specifications, it follows the
server and pathname and is separated by the pound sign (#). A
fragment identifier indicates to the browser that it should begin
displaying the target document at the indicated fragment name. As we
describe in more detail later in this chapter, you insert fragment
names into a document either with the universal id
tag attribute or with the name attribute for the
<a> tag. Like a pathname, a fragment name
may be any sequence of characters.

The fragment name and the preceding hash symbol are optional; omit
them when referencing a document without defined fragments.

Formally, the fragment element applies only to HTML or XHTML
documents. If the target of the URL is some other document type, the
fragment name may be misinterpreted by the browser.

Fragments are useful for long documents. By identifying key sections
of your document with a fragment name, you make it easy for readers
to link directly to that portion of the document, avoiding the tedium
of scrolling or searching through the document to get to the section
that interests them.

As a rule of thumb, we recommend that every section header in your
documents be accompanied by an equivalent fragment name. By
consistently following this rule, you''ll make it
possible for readers to jump to any section in any of your documents.
Fragments also make it easier to build tables of contents for your
document families.

6.2.3.5 The http search parameter

The search component of the http URL, along with its
preceding question mark, is optional. It indicates that the path is a
searchable or executable resource on the server. The content of the
search component is passed to the server as parameters that control
the search or execution function.

The actual encoding of parameters in the search component is
dependent upon the server and the resource being referenced. The
parameters for searchable resources are covered later in this
chapter, when we discuss searchable documents. Parameters for
executable resources are discussed in Chapter 9.

Although our initial presentation of http URLs indicated that a URL
can have either a fragment identifier or a search component, some
browsers let you use both in a single URL. If you so desire, you can
follow the search parameter with a fragment identifier, telling the
browser to begin displaying the results of the search at the
indicated fragment. Netscape, for example, supports this usage.

We don''t recommend this kind of URL, though. First
and foremost, it doesn''t work on a lot of browsers.
Just as important, using a fragment implies that you are sure that
the results of the search will have a fragment of that name defined
within the document. For large document collections, this is hardly
likely. You are better off omitting the fragment, showing the search
results from the beginning of the document, and avoiding potential
confusion among your readers.

6.2.3.6 Sample http URLs

Here are some sample http URLs:

http://www.oreilly.com/catalogl
http://www.oreilly.com/
http://www.kumquat.com:8080/
http://www.kumquat.com/planting/guidel#soil_prep
http://www.kumquat.com/find_a_quat?state=Florida

The first example is an explicit reference to a bona fide HTML
document named catalogl that is stored in
the root directory of the www.oreilly.com
server. The second references the top-level home page on that same
server. That home page may or may not be
catalogl. Sample three also assumes that
there is a home page in the root directory of the
www.kumquat.com server and that the web
connection is to the nonstandard port 8080.

The fourth example is the URL for retrieving the web document named
guidel from the
planting directory on the
www.kumquat.com server. Once retrieved, the
browser should display the document beginning at the fragment named
soil_ prep.

The last example invokes an executable resource named
find_a_quat with the parameter named
state set to the value
Florida. Presumably, this resource generates an
HTML or XHTML response, often a new document, that is subsequently
displayed by the browser.

6.2.4 The file URL

The file URL is perhaps the second most
common one used, but it is not readily recognized by web users and
particularly web authors. It points to a file stored on a computer
without indicating the protocol used to retrieve the file. As such,
it has limited use in a networked environment.
That''s a good thing. The file URL lets you load and
display a locally stored document and is particularly useful for
referencing personal HTML/XHTML document collections, such as those
"under construction" and not yet
ready for general distribution, or document collections on CD-ROM.
The file URL has the following format:

file://server/path

6.2.4.1 The file server

The file server can be,
like the http one, an Internet domain name or IP address of the
computer containing the file to be retrieved. Unlike http, however,
which requires TCP/IP networking, the file server may also be the
unqualified but unique name of a computer on a personal network, or a
storage device on the same computer, such as a CD-ROM, or mapped from
another networked computer. No assumptions are made as to how the
browser might contact the machine to obtain the file; presumably the
browser can make some connection, perhaps via a Network File System
or FTP, to obtain the file.

If you omit the server name by including an extra slash (/) in the
URL, or if you use the special name localhost,
the browser retrieves the file from the machine on which the browser
is running. In this case, the browser simply accesses the file using
the normal facilities of the local operating system. In fact, this is
the most common usage of the file URL. By creating document families
on a diskette or CD-ROM and referencing your hyperlinks using the
file:/// URL, you create a distributable,
standalone document collection that does not require a network
connection to use.

6.2.4.2 The file path

This is the path of the file to be retrieved on the desired server.
The syntax of the path may differ based upon the operating system of
the server; be sure to encode any potentially dangerous characters in
the path.

6.2.4.3 Sample file URLs

The file URL is easy:

file://localhost/home/chuck/documentl
file:///home/chuck/documentl
file://marketing.kumquat.com/monthly_salesl
file://D:/monthly_salesl

The first URL retrieves
/home/chuck/documentl from the
user''s local machine off the current storage device,
typically C:\ on a Windows PC. The second is
identical to the first, except we''ve omitted the
localhost reference to the server; the server
name defaults to the local drive.

The third example uses some protocol to retrieve
monthly_salesl from the
marketing.kumquat.com server, while the fourth
example uses the local PC''s operating system to
retrieve the same file from the D:\ drive or
device.

6.2.5 The mailto URL

The mailto URL is very common in HTML/XHTML
documents. It has the browser send an electronic mail message to a
named recipient. It has the format:

mailto:address

The address is any valid email address, usually
of the form:

user@server

Thus, a typical mailto URL might look like:

mailto:cmusciano@aol.com

You may include multiple recipients in the mailto URL, separated by
commas. For example, this URL addresses the message to all three
recipients.

mailto:cmusciano@aol.com,bkennedy@activmedia.com,booktech@ora.com

There should be no spaces before or after the commas in the URL.

6.2.5.1 Defining mail header fields

The popular browsers open an email helper or plug-in
application when the user selects a mailto URL. It may be the default
email program for their system, or Outlook Express with Internet
Explorer, or Netscape''s built-in Communicator. With
some browsers, users can designate their own email programs for
handling mailto URLs by altering a specification in their
browsers'' Internet Options or Preferences.

Like http search parameters that you attach at the end of the URL,
separated by question marks (?), you include email-related parameters
with the mailto URL in the HTML document. Typically, additional
parameters may include the message''s header fields,
such as the subject, cc (carbon copy), and
bcc (blind carbon copy) recipients. How these additional fields are
handled depends on the email program.

A few examples are in order:

mailto:cmusciano@aol.com?subject=Loved your book!
mailto:cmusciano@aol.com?cc=booktech@oreilly.com
mailto:cmusciano@aol.com?bcc=archive@myserver.com

As you can probably guess, the first URL sets the subject of the
message. Note that some email programs allow spaces in the parameter
value while others do not. Annoyingly, you can''t
replace spaces with their hexadecimal equivalent,
%20, because many email programs
won''t make the proper substitution.
It''s best to use spaces, since the email programs
that don''t honor the spaces simply truncate the
parameter to the first word.

The second URL places the address
booktech@oreilly.com in the cc field of the
message. Similarly, the last example sets the bcc field. You may also
set several fields in one URL by separating the field definitions
with ampersands. For example, this URL sets the subject and
carbon-copy addresses:

mailto:cmusciano@aol.com?subject=Loved your book!&cc=booktech@oreilly.com&bcc=archive@myserver.com

Not all email programs accept or recognize the bcc and cc extensions
in the mailto URL some either ignore them or append them to a
preceding subject. Thus, when forming a mailto URL,
it''s best to order the extra fields as subject
first, followed by cc and bcc. And don''t depend on
the cc and bcc recipients being included in the email.

6.2.6 The ftp URL

The ftp
URL is used to retrieve documents from an FTP (File Transfer
Protocol) server.[6] It has the format:

[6] FTP is an ancient Internet protocol
that dates back to the Dark Ages, around 1975. It was designed as a
simple way to move files between machines and is popular and useful
to this day. Many HTML/XHTML authors use FTP to place files on their
web servers.

ftp://user:password@server:port/path;type=typecode

6.2.6.1 The ftp user and password

FTP is an authenticated service, meaning that you must have a valid
username and password in order to retrieve documents from a server.
However, most FTP servers also support restricted, nonauthenticated
access known as anonymous
FTP. In this mode, anyone can supply the
username "anonymous" and be granted
access to a limited portion of the server''s
documents. Most FTP servers also assume (but may not grant) anonymous
access if the username and password are omitted.

If you are using an ftp URL to access a site that requires a username
and password, include the user and
password components in the URL, along with the
colon (:) and at sign (@). More commonly, you''ll be
accessing an anonymous FTP server, and the user and password
components can be omitted.

If you keep the user component and at sign but omit the password and
the preceding colon, most browsers prompt you for a password after
connecting to the FTP server. This is the recommended way of
accessing authenticated resources on an FTP server; it prevents
others from seeing your password.

We recommend you never place an ftp URL with a
username and password in any HTML/XHTML document. The reasoning is
simple: anyone can retrieve the simple text document, extract the
username and password from the URL, log into the FTP server, and
tamper with its documents.

6.2.6.2 The ftp server and port

The ftp
server and port operate
by the same rules as the server and port in an http URL. The server
must be a valid Internet domain name or IP address, and the optional
port specifies the port on which the server is listening for
requests. If omitted, the default port number is 21.

6.2.6.3 The ftp path and typecode

The path component of an ftp URL represents a
series of directories, separated by slashes, leading to the file to
be retrieved. By default, the file is retrieved as a binary file;
this can be changed by adding the
typecode (and the preceding
;type=) to the URL.

If the typecode is set to d, the path is assumed
to be a directory. The browser requests a listing of the directory
contents from the server and displays this listing to the user. If
the typecode is any other letter, it is used as a parameter to the
FTP type command before retrieving the file
referenced by the path. While some FTP servers may implement other
codes, most servers accept i to initiate a
binary transfer and a to treat the file as a
stream of ASCII text.

6.2.6.4 Sample ftp URLs

Here are some sample ftp URLs:

ftp://www.kumquat.com/sales/pricing
ftp://bob@bobs-box.com/results;type=d
ftp://bob:secret@bobs-box.com/listing;type=a

The first example retrieves the file named
pricing from the sales
directory on the anonymous FTP server at
www.kumquat.com. The second logs into the FTP
server on bobs-box.com as user
bob, prompting for a password before retrieving
the contents of the directory named results and
displaying them to the user. The last example logs into
bobs-box.com as bob with
the password secret and retrieves the file named
listing, treating its contents as ASCII
characters.

6.2.7 The javascript URL

The javascript URL actually is
a pseudoprotocol, not usually included in discussions of URLs. Yet,
with advanced browsers like Netscape and Internet Explorer, the
javascript URL can be associated with a hyperlink and used to execute
JavaScript commands when the user selects the link. [Section 12.3.4]

6.2.7.1 The javascript URL arguments

What follows the javascript pseudoprotocol is one or more
semicolon-separated JavaScript expressions and methods, including
references to multi-expression JavaScript functions that you embed
within the <script> tag in your documents
(see Chapter 12 for details). For example:

javascript:window.alert(''Hello, world!'')
javascript:doFlash(''red'', ''blue''); window.alert(''Do not press me!'')

are valid URLs that you may include as the value for a link reference
(see Section 6.3.1.2 and Section 6.5.4.3). The first example contains a single
JavaScript method that activates an alert dialog with the simple
message "Hello, world!"

The second javascript URL example contains two arguments: the first
calls a JavaScript function, doFlash, which
presumably you have located elsewhere in the document within the
<script> tag and which perhaps flashes the
background color of the document window between red and blue. The
second expression is the same alert method as in the first example,
with a slightly different message.

The javascript URL may appear in a hyperlink sans arguments, too. In
that case, the Netscape browser alone not Internet
Explorer opens a special JavaScript editor wherein the user may
type in and test the various expressions and methods.

6.2.8 The news URL

Although rarely used anymore, the news URL
accesses either a single message or an entire newsgroup within the
Usenet news system. It has two forms:

news:newsgroup
news:message_id

An unfortunate limitation in news URLs is that they
don''t allow you to specify a news server. Rather,
users specify news servers in their browser preferences. At one time,
not long ago, Internet newsgroups were nearly universally
distributed; all news servers carried all the same newsgroups and
their respective articles, so one news server was as good as any.
Today, the sheer bulk of disk space needed to store the daily volume
of newsgroup activity is often prohibitive for any single news
server, and there''s also local censorship of
newsgroups. Hence, you cannot expect that all newsgroups, and
certainly not all articles for a particular newsgroup, will be
available on the user''s news server.

Many users'' browsers may not be correctly configured
to read news. We recommend that you avoid placing news URLs in your
documents except in rare cases.

6.2.8.1 Accessing entire newsgroups

There are several thousand newsgroups devoted to nearly every
conceivable topic under the sun and beyond. Each group has a unique
name, composed of hierarchical elements separated by periods. For
example, the World Wide Web announcements newsgroup is:

comp.infosys.www.announce

To access this group, use the URL:

news:comp.infosys.www.announce

6.2.8.2 Accessing single messages

Every message on a news server has a unique message
identifier (ID) associated with it.
This ID has the form:

unique_string@server

The unique_string is a sequence of ASCII
characters; the server is usually the name of the machine from which
the message originated. The unique_string must
be unique among all the messages that originated from the server. A
sample URL to access a single message might be:

news:12A7789B@news.kumquat.com

In general, message IDs are cryptic sequences of characters not
readily understood by humans. Moreover, the life span of a message on
a server is usually measured in days, after which the message is
deleted and the message ID is no longer valid. The bottom line:
single message news URLs are difficult to create, become invalid
quickly, and generally are not used.

6.2.9 The nntp URL

The nntp URL
goes beyond the news URL to provide a complete mechanism for
accessing articles in the Usenet news system. It has the form:

nntp://server:port/newsgroup/article

6.2.9.1 The nntp server and port

The
nntp server and
port are defined similarly to the http server
and port, described earlier. The server must be the Internet domain
name or IP address of an nntp server; the port is the port on which
that server is listening for requests.

If the port and its preceding colon are omitted, the default port of
119 is used.

6.2.9.2 The nntp newsgroup and article

The newsgroup is the name of the group from
which an article is to be retrieved, as defined in Section 6.2.8. The
article is the numeric id
of the desired article within that
newsgroup. Although the article number is easier to determine than a
message ID, it falls prey to the same limitations of single message
references using the news URL, described in Section 6.2.8. Specifically, articles do not last long on
most nntp servers, and nntp URLs quickly become invalid as a result.

6.2.9.3 Sample nntp URLs

A sample nntp URL might be:

nntp://news.kumquat.com/alt.fan.kumquats/417

This URL retrieves article 417 from the
alt.fan.kumquats newsgroup on
news.kumquat.com. Keep in mind that the article
will be served only to machines that are allowed to retrieve articles
from this server. In general, most nntp servers restrict access to
those machines on the same local area network.

6.2.10 The telnet URL

The telnet URL opens an interactive session
with a desired server, allowing the user to log in and use the
machine. Often, the connection to the machine automatically starts a
specific service for the user; in other cases, the user must know the
commands to type to use the system. The telnet URL has the form:

telnet://user:password@server:port/

6.2.10.1 The telnet user and password

The telnet user
and password are used exactly like the user and
password components of the ftp URL, described previously. In
particular, the same caveats apply regarding protecting your password
and never placing it within a URL.

Just like the ftp URL, if you omit the password from the URL, the
browser should prompt you for a password just before contacting the
telnet server.

If you omit both the user and password, the telnet occurs without
supplying a username. For some servers, telnet automatically connects
to a default service when no username is supplied. For others, the
browser may prompt for a username and password when making the
connection to the telnet server.

6.2.10.2 The telnet server and port

The telnet server and
port
are defined similarly to the http server and port, described earlier.
The server must be the Internet domain name or IP address of a telnet
server; the port is the port on which that server is listening for
requests. If the port and its preceding colon are omitted, the
default port of 23 is used.

6.2.11 The gopher URL

Gopher is a web-like document-retrieval system that achieved some
popularity on the Internet just before the Web took off, making
gopher obsolete. Some gopher servers still exist, though, and the
gopher URL lets you access gopher
documents.

The gopher URL has the form:

gopher://server:port/path

6.2.11.1 The gopher server and port

The gopher server and port are defined similarly
to the http server and port, described previously. The server must be
the Internet domain name or IP address of a gopher server; the port
is the port on which that server is listening for requests.

If the port and its preceding colon are omitted, the default port of
70 is used.

6.2.11.2 The gopher path

The gopher path can take one of three forms:

type/selector
type/selector%09search
type/selector%09search%09gopherplus

The type is a single character value denoting the
type of the gopher resource. If the entire path is omitted from the
gopher URL, the type defaults to 1.

The
selector corresponds to the path of a resource on
the gopher server. It may be omitted, in which case the top-level
index of the gopher server is retrieved.

If the gopher resource is actually a gopher search engine, the
search component provides the string for which
to search. The search string must be preceded by an encoded
horizontal tab (%09).

If the gopher server supports gopher+ resources, the
gopherplus component supplies the necessary
information to locate that resource. The exact content of this
component varies based upon the resources on the gopher server. This
component is preceded by an encoded horizontal tab
(%09). If you want to include the
gopherplus component but omit the search
component, you must still supply both encoded tabs within the URL.