1.2 Talking the Internet Talk
Every
computer connected to the Internet (even a beat-up old Apple II) has
a unique address: a number whose format is defined by the
Internet protocol (IP), the standard that
defines how messages are passed from one machine to another on the
Net. An IP
address is made up of four numbers, each
less than 256, joined together by periods, such as 192.12.248.73 or
131.58.97.254.
While
computers deal only with numbers, people prefer names. For this
reason, each computer on the Internet also has a name bestowed upon
it by its owner. There are several million machines on the Net, so it
would be very difficult to come up with that many unique names, let
alone keep track of them all. Recall, though, that the Internet is a
network of networks. It is divided into groups known as
domains,
which are further divided into one or more
subdomains.
So, while you might choose a very common name for your computer, it
becomes unique when you append, like surnames, all of the
machine''''s domain names as a period-separated suffix,
creating a fully qualified domain name.
This naming stuff is
easier than it sounds. For example, the fully qualified domain name
www.oreilly.com translates to a machine named
"www" that''''s part
of the domain known as "oreilly,"
which, in turn, is part of the commercial (com) branch of the
Internet. Other branches of the Internet include educational
institutions (edu), nonprofit organizations (org), the U.S.
government (gov), and Internet service providers (net). Computers and
networks outside the United States may have two-letter abbreviations
at the end of their names: for example,
"ca" for Canada,
"jp" for Japan, and
"uk" for the United Kingdom.
Special
computers, known as name servers, keep tables of
machine names and their associated unique numerical IP addresses and
translate one into the other for us and for our machines. Domain
names must be registered and paid for through any one of the now many
for-profit registrars.[1] Once it is
registered, the owner of the unique domain name broadcasts it and its
address to other domain name servers around the world. Each domain
and subdomain has an associated name server, so ultimately every
machine is known uniquely by both a name and an IP address.
[1] At one time, a single
nonprofit organization known as InterNIC handled that function. Now
ICANN.org coordinates U.S. government-related name servers, but other
organizations or individuals must work through a for-profit company
to register their unique domain names.
1.2.1 Clients, Servers, and Browsers
The
Internet connects two kinds of computers:
servers, which serve up documents, and
clients, which retrieve and display documents
for us humans. Things that happen on the server machine are said to
be on the server side, while activities on the
client machine occur on the client side.
To access and display HTML documents,
we run programs called browsers on our client
computers. These browser clients talk to special web
servers over the Internet to access and retrieve
electronic documents.
Several
web browsers are available (most for free), each offering a different
set of features. For example, browsers like
Lynx run on
character-based clients and display documents only as text. Others
run on clients with graphical displays and render documents using
proportional fonts and color graphics on a 1024 x 768,
24-bit-per-pixel display. Others
still Netscape
Navigator, Microsoft''''s Internet Explorer, and Opera, to name
the leading few have special features that allow you to
retrieve and display a variety of electronic documents over the
Internet, including audio and video multimedia.
1.2.2 The Flow of Information
All web activity begins on the client
side, when a user starts his or her browser. The browser begins by
loading a home page document, either from local
storage or from a server over some network, such as the Internet, a
corporate intranet, or a town extranet. In these latter cases, the
client browser first consults a domain name system (DNS) server to
translate the home page document server''''s name, such
as www.oreilly.com, into an IP address, before
sending a request to that server over the Internet. This request (and
the server''''s reply) is formatted according to the
dictates of the Hypertext Transfer
Protocol (HTTP) standard.
A server spends
most of its time listening to the network, waiting for document
requests with the server''''s unique address stamped on
them. Upon receipt of a request, the server verifies that the
requesting browser is allowed to retrieve documents from the server
and, if so, checks for the requested document. If found, the server
sends (downloads) the document to the browser. The server usually
logs the request, the client computer''''s name, the
document requested, and the time.
Back on the browser, the document arrives. If it''''s a
plain-vanilla ASCII text file, most browsers display it in a common,
plain-vanilla way. Document directories, too, are treated like plain
documents, although most graphical browsers display folder icons that
the user can select with the mouse to download the contents of
subdirectories.
Browsers also
retrieve context files from a server. Unless assisted by a
helper program or specially enabled by
plug-in software or
applets, which display an image or video file or
play an audio file, the browser usually stores downloaded binary
files directly on a local disk for later use.
For the most part, however, the browser retrieves a special document
that appears to be a plain text file but that contains both text and
special markup codes called
tags.
The browser processes these HTML or XHTML documents, formatting the
text based on the tags and downloading special accessory files, such
as images.
The user reads the document, selects a hyperlink to another document,
and the entire process starts over.
1.2.3 Beneath the Web
We should point out again that browsers and HTTP servers need not be
part of the Web to function. In fact, you never need to be connected
to the Internet or to any network, for that matter, to write
documents and operate a browser. You can load and display locally
stored documents and accessory files directly on your browser. Many
organizations take advantage of this capability by distributing
catalogues and product manuals, for instance, on a much less
expensive, but much more interactively useful, CD-ROM, rather than
via traditional print on paper.
Isolating web documents is good for the author, too, since it gives
you the opportunity to finish, in the editorial sense of the word, a
document collection for later distribution. Diligent authors work
locally to write and proof their documents before releasing them for
general distribution, thereby sparing readers the agonies of broken
image files and bogus hyperlinks.[2]
[2] Vigorous testing of
HTML documents once they are made available on the Web is, of course,
also highly recommended and necessary to rid them of various linking
bugs.
Organizations, too, can be connected to
the Internet but also maintain private webs and document collections
for distribution to clients on their local networks, or intranets. In
fact, private webs are fast becoming the technology of choice for the
paperless offices we''''ve heard so much about during
these last few years. With HTML and XHTML document collections,
businesses can maintain personnel databases complete with employee
photographs and online handbooks, collections of blueprints, parts,
assembly manuals, and so on all readily and easily accessed
electronically by authorized users and displayed on a local
computer.
1.2.4 Standards Organizations
Like many
popular technologies, HTML started out as an informal specification
used by only a few people. As more and more authors began to use the
language, it became obvious that more formal means were needed to
define and manage i.e., to standardize the
language''''s features, making it easier for everyone
to create and share documents.
1.2.4.1 The World Wide Web Consortium
The World Wide Web Consortium (W3C)
was formed with the charter to define the standards for HTML and,
later, XHTML. Members are responsible for drafting, circulating for
review, and modifying the standard based on cross-Internet feedback
to best meet the needs of the many.
Beyond HTML and XHTML, the W3C has the broader responsibility of
standardizing any technology related to the Web; they manage the
HTTP, Cascading Style Sheets (CSS), and Extensible Markup Language
(XML) standards, as well as related standards for document addressing
on the Web. They also solicit draft standards for extensions to
existing web technologies.
If you want to track HTML, XML, XHTML, CSS, and other exciting web
development and related technologies, contact the W3C at
http://www.w3.org.
Also, several Internet newsgroups are devoted to the Web, each a part
of the comp.infosystems.www hierarchy. These
include comp.infosystems.www.authoringl and
comp.infosystems.www.authoring.images.
1.2.4.2 The Internet Engineering Task Force
Even broader in reach
than W3C, the Internet Engineering Task Force (IETF) is responsible
for defining and managing every aspect of Internet technology. The
Web is just one small area under the purview of the IETF.
The IETF defines all of the technology of the Internet via official
documents known as Requests for Comments, or RFCs.
Individually numbered for easy reference, each RFC addresses a
specific Internet technology everything from the syntax of
domain names and the allocation of IP addresses to the format of
electronic mail messages.
To learn more about the IETF and follow the progress of various RFCs
as they are circulated for review and revision, visit the IETF home
page, http://www.ietf.org.