2.2 The Layers of a Network
Sending
data across a network is a complex operation that must be carefully
tuned to the physical characteristics of the network as well as the
logical character of the data being sent. Software that sends data
across a network must understand how to avoid collisions between
packets, convert digital data to analog signals, detect and correct
errors, route packets from one host to another, and more. The process
becomes even more complicated when the requirement to support
multiple operating systems and heterogeneous network cabling is
added.To make this complexity manageable and hide most of it from the
application developer and end user, the different aspects of network
communication are separated into multiple layers. Each layer
represents a different level of abstraction between the physical
hardware (e.g., the wires and electricity) and the information being
transmitted. Each layer has a strictly limited function. For
instance, one layer may be responsible for routing packets, while the
layer above it is responsible for detecting and requesting
retransmission of corrupted packets. In theory, each layer only talks
to the layers immediately above and immediately below it. Separating
the network into layers lets you modify or even replace the software
in one layer without affecting the others, as long as the interfaces
between the layers stay the same.There are several different layer models, each organized to fit the
needs of a particular kind of network. This book uses the standard
TCP/IP four-layer model appropriate for the Internet, shown in Figure 2-1. In this model, applications like Internet
Explorer and Eudora run in the application layer and talk only to the
transport layer. The transport layer talks only to the application
layer and the internet layer. The internet layer in turn talks only
to the host-to-network layer and the transport layer, never directly
to the application layer. The host-to-network layer moves the data
across the wires, fiber optic cables, or other medium to the
host-to-network layer on the remote system, which then moves the data
up the layers to the application on the remote system.
Figure 2-1. The layers of a network

retrieve a page, the browser is actually only talking to the
transport layer on the local client machine. The transport layer
breaks the request up into TCP segments, adds some sequence numbers
and checksums to the data, and then passes the request to the local
internet layer. The internet layer fragments the segments into IP
datagrams of the necessary size for the local network and passes them
to the host-to-network layer for transmission onto the wire. The
host-to-network layer encodes the digital data as analog signals
appropriate for the particular physical medium and sends the request
out the wire where it will be read by the host-to-network layer of
the remote system to which it's addressed.The host-to-network layer on the remote system decodes the analog
signals into digital data then passes the resulting IP datagrams to
the server's internet layer. The internet layer does
some simple checks to see that the IP datagrams
aren't corrupt, reassembles them if
they've been fragmented, and passes them to the
server's transport layer. The
server's transport layer checks to see that all the
data arrived and requests retransmission of any missing or corrupt
pieces. (This request actually goes back down through the
server's internet layer, through the
server's host-to-network layer, and back to the
client system, where it bubbles back up to the
client's transport layer, which retransmits the
missing data back down through the layers. This is all transparent to
the application layer.) Once the server's transport
layer has received enough contiguous, sequential datagrams, it
reassembles them and writes them onto a stream read by the web server
running in the server application layer. The server responds to the
request and sends its response back down through the layers on the
server system for transmission back across the Internet and delivery
to the web client.As you can guess, the real process is much more elaborate. The
host-to-network layer is by far the most complex, and a lot has been
deliberately hidden. For example, it's entirely
possible that data sent across the Internet will pass through several
routers and their layers before reaching its final destination.
However, 90% of the time your Java code will work in the application
layer and only need to talk to the transport layer. The other 10% of
the time, you'll be in the transport layer and
talking to the application layer or the internet layer. The
complexity of the host-to-network layer is hidden from you;
that's the point of the layer model.
|
the application layer on the other system; the network creates a
logical path between the two application layers.
It's easy to understand the logical path if you
think about an IRC chat session. Most participants in an IRC chat
would say that they're talking to another person. If
you really push them, they might say that they're
talking to their computer (really the application layer), which is
talking to the other person's computer, which is
talking to the other person. Everything more than one layer deep is
effectively invisible, and that is exactly the way it should be.
Let's consider each layer in more detail.
2.2.1 The Host-to-Network Layer
As a Java programmer,
you're fairly high up in the network food chain. A
lot happens below your radar. In the standard reference model for
IP-based Internets (the only kind of network Java really
understands), the hidden parts of the network belong to the
host-to-network layer (also known as the link
layer, data link layer, or network interface layer). The
host-to-network layer defines how a particular network
interfacesuch as an Ethernet card or a PPP
connectionsends IP datagrams over its physical connection to
the local network and the world.The part of the host-to-network layer made up of the hardware that
connects different computers (wires, fiber optic cables, microwave
relays, or smoke signals) is sometimes called the physical layer of
the network. As a Java programmer, you don't need to
worry about this layer unless something goes wrongthe plug
falls out of the back of your computer, or someone drops a backhoe
through the T-1 line between you and the rest of the world. In other
words, Java never sees the physical layer.For computers to communicate with each other, it
isn't sufficient to run wires between them and send
electrical signals back and forth. The computers have to agree on
certain standards for how those signals are interpreted. The first
step is to determine how the packets of electricity or light or smoke
map into bits and bytes of data. Since the physical layer is analog,
and bits and bytes are digital, this process involves a
digital-to-analog conversion on the sending end and an
analog-to-digital conversion on the receiving end.Since all real analog systems have noise, error correction and
redundancy need to be built into the way data is translated into
electricity. This is done in the data link layer. The most common
data link layer is Ethernet. Other popular data link layers include
TokenRing, PPP, and Wireless Ethernet (802.11). A specific data link
layer requires specialized hardware. Ethernet cards
won't communicate on a TokenRing network, for
example. Special devices called gateways convert
information from one type of data link layer, such as Ethernet, to
another, such as TokenRing. As a Java programmer, the data link layer
does not affect you directly. However, you can sometimes optimize the
data you send in the application layer to match the native packet
size of a particular data link layer, which can have some affect on
performance. This is similar to matching disk reads and writes to the
native block size of the disk. Whatever size you choose, the program
will still run, but some sizes let the program run more efficiently
than others, and which sizes these are can vary from one computer to
the next.
2.2.2 The Internet Layer
The next layer of the network, and the
first that you need to concern yourself with, is the
internet layer. In the OSI model, the internet
layer goes by the more generic name network
layer. A network layer protocol defines how
bits and bytes of data are organized into the larger groups called
packets, and the addressing scheme by which different machines find
each other. The Internet Protocol (IP) is the most widely used
network layer protocol in the world and the only network layer
protocol Java understands. IP is almost exclusively the focus of this
book. Other, semi-common network layer protocols include
Novell's IPX, and IBM and
Microsoft's NetBEUI, although nowadays most
installations have replaced these protocols with IP. Each network
layer protocol is independent of the lower layers. IP, IPX, NetBEUI,
and other protocols can each be used on Ethernet, Token Ring, and
other data link layer protocol networks, each of which can themselves
run across different kinds of physical layers.Data is sent across the internet layer in packets called
datagrams.
Each IP datagram contains a header between 20 and 60 bytes long and a
payload that contains up to 65,515 bytes of data. (In practice, most
IP datagrams are much smaller, ranging from a few dozen bytes to a
little more than eight kilobytes.) The header of each
IP datagram contains these items, in
this order:
Always 0100 (decimal 4) for current IP; will be changed to 0110
(decimal 6) for IPv6, but the entire header format will also change
in IPv6.
An unsigned integer between 0 and 15 specifying the number of 4-byte
words in the header; since the maximum value of the header length
field is 1111 (decimal 15), an IP header can be at most 60 bytes
long.
A 3-bit precedence field that is no longer used, four type-of-service
bits (minimize delay, maximize throughput, maximize reliability,
minimize monetary cost) and a zero bit. Not all service types are
compatible. Many computers and routers simply ignore these bits.
An unsigned integer specifying the length of the entire datagram,
including both header and payload.
A unique identifier for each datagram sent by a host; allows
duplicate datagrams to be detected and thrown away.
The first bit is 0; the second bit is 0 if this datagram may be
fragmented, 1 if it may not be; and the third bit is 0 if this is the
last fragment of the datagram, 1 if there are more fragments.
In the event that the original IP datagram is fragmented into
multiple pieces, this field identifies the position of this fragment
in the original datagram.
Number of nodes through which the datagram can pass before being
discarded; used to avoid infinite loops.
6 for TCP, 17 for UDP, or a different number between 0 and 255 for
each of more than 100 different protocols (some quite obscure); see
http://www.iana.org/assignments/protocol-numbers
for the complete current list.
A checksum of the header only (not the entire datagram) calculated
using a 16-bit one's complement sum.
The IP address of the sending node.
The IP address of the destination node.
In addition, an IP datagram header may contain between 0 and 40 bytes
of optional information, used for security options, routing records,
timestamps, and other features Java does not support. Consequently,
we will not discuss them here. The interested reader is referred to
TCP/IP Illustrated, Volume 1: The Protocols, by
W. Richard Stevens (Addison Wesley), for more details on these
fields. Figure 2-2 shows how the different
quantities are arranged in an IP datagram. All bits and bytes are
big-endian; most significant to least significant runs left to right.
Figure 2-2. The structure of an IPv4 datagram

2.2.3 The Transport Layer
Raw datagrams have some drawbacks. Most
notably, there's no guarantee that they will be
delivered. Even if they are delivered, they may have been corrupted
in transit. The header checksum can only detect corruption in the
header, not in the data portion of a datagram. Finally, even if the
datagrams arrive uncorrupted, they do not necessarily arrive in the
order in which they were sent. Individual datagrams may follow
different routes from source to destination. Just because datagram A
is sent before datagram B does not mean that datagram A will arrive
before datagram B.The transport layer is responsible for ensuring
that packets are received in the order they were sent and making sure
that no data is lost or corrupted. If a packet is lost, the transport
layer can ask the sender to retransmit the packet. IP networks
implement this by adding an additional header to each datagram that
contains more information. There are two primary protocols at this
level. The first, the Transmission Control Protocol (TCP), is a
high-overhead protocol that allows for retransmission of lost or
corrupted data and delivery of bytes in the order they were sent. The
second protocol, the User Datagram Protocol (UDP), allows the
receiver to detect corrupted packets but does not guarantee that
packets are delivered in the correct order (or at all). However, UDP
is often much faster than TCP. TCP is called a
reliable protocol; UDP
is an unreliable protocol. Later,
we'll see that unreliable protocols are much more
useful than they sound.
2.2.4 The Application Layer
The layer that delivers data to the user
is called the application layer. The three lower
layers all work together to define how data is transferred from one
computer to another. The application layer decides what to do with
the data after it's transferred. For example, an
application protocol like HTTP (for the World Wide Web) makes sure
that your web browser knows to display a graphic image as a picture,
not a long stream of numbers. The application layer is where most of
the network parts of your programs spend their time. There is an
entire alphabet soup of application layer protocols; in addition to
HTTP for the Web, there are SMTP, POP, and IMAP for email; FTP, FSP,
and TFTP for file transfer; NFS for file access; NNTP for news
transfer; Gnutella, FastTrack, and Freenet for file sharing; and
many, many more. In addition, your programs can define their own
application layer protocols as necessary.