2.4 The Internet
The
Internet
is the world's largest IP-based network. It is an
amorphous group of computers in many different countries on all seven
continents (Antarctica included) that talk to each other using the IP
protocol. Each computer on the Internet has at least one unique IP
address by which it can be identified. Most of them also have at
least one name that maps to that IP address. The Internet is not
owned by anyone, although pieces of it are. It is not governed by
anyone, which is not to say that some governments
don't try. It is simply a very large collection of
computers that have agreed to talk to each other in a standard way.The Internet is not the only IP-based network, but it is the largest
one. Other IP networks are called internets with
a little i: for example, a corporate IP network
that is not connected to the Internet. Intranet
is a current buzzword that loosely describes corporate practices of
putting lots of data on internal web servers.Unless you're working in a high security environment
that's physically disconnected from the broader
network, it's likely that the internet
you'll be using is the Internet. To make sure that
hosts on different networks on the Internet can communicate with each
other, a few rules need to be followed that don't
apply to purely internal internets. The most important rules deal
with the assignment of addresses to different organizations,
companies, and individuals. If everyone picked the Internet addresses
they wanted at random, conflicts would arise almost immediately when
different computers showed up on the Internet with the same address.
2.4.1 Internet Address Classes
To
avoid this problem, blocks of IPv4 addresses are assigned to Internet
Service Providers (ISPs) by their regional Internet registry. When a
company or an organization wants to set up an IP-based network
connected to the Internet, their ISP gives them a block of addresses.
Traditionally, these blocks come in three sizes called Class A, Class
B, and Class C. A Class C address block specifies the first three
bytes of the address; for example, 199.1.32. This allows room for 254
individual addresses from 199.1.32.1 to 199.1.32.254.[1] A
class B address block only specifies the first two bytes of the
addresses an organization may use; for instance, 167.1. Thus, a class
B address has room for 65,024 different hosts (256 Class C size
blocks times 254 hosts per Class C block). A class A address block
only specifies the first byte of the address rangefor
instance, 18and therefore has room for over 16 million nodes.[1] Addresses with the last byte either .0 or .255 are reserved and
should never actually be assigned to hosts.
|
Class B, or Class B and a Class C. This has become a problem because
there are many organizations with more than 254 computers connected
to the Internet but less than 65,024. If each of these organizations
gets a full Class B block, many addresses are wasted.
There's a limited number of IPv4
addressesabout 4.2 billion, to be precise. That sounds like a
lot, but it gets crowded quickly when you can easily waste fifty or
sixty thousand addresses at a shot.There are also many networks, such as the author's
own personal basement-area network, that have a few to a few dozen
computers but not 255. To more efficiently allocate the limited
address space, Classless Inter-Domain Routing (CIDR)
was invented. CIDR mostly (though not completely) replaces the whole
A, B, C, D, E addressing scheme with one based on a specified numbers
of prefix bits. These prefixes are generally written as
/nn, where nn is a
two-digit number specifying the number of bits in the network portion
of the address. The number after the / indicates the number of fixed
prefix bits. Thus, a /24 fixes the first 24 bits in the address,
leaving 8 bits available to distinguish individual nodes. This allows
256 nodes, and is equivalent to an old style Class C. A /19 fixes 19
bits, leaving 13 for individual nodes within the network.
It's equivalent to 32 separate Class C networks or
an eighth of a Class B. A /28, generally the smallest
you're likely to encounter in practice, leaves only
four bits for identifying local nodes. It can handle networks with up
to 16 nodes. CIDR also carefully specifies which address blocks are
associated with which ISPs. This scheme helps keep Internet routing
tables smaller and more manageable than they would be under the old
system.Several address blocks and patterns are special. All IPv4 addresses
that begin with 10., 172.16. through 172.31., and 192.168. are
deliberately unassigned. They can be used on internal networks, but
no host using addresses in these blocks is allowed onto the global
Internet. These
non-routable addresses are useful for building
private networks that can't be seen from the rest of
the Internet or for building a large network when
you've only been assigned a class C address block.
IPv4 addresses beginning with 127 (most commonly 127.0.0.1) always
mean the local loopback
address. That is, these addresses always point
to the local computer, no matter which computer
you're running on. The hostname for this address is
generally localhost.
In IPv6 0:0:0:0:0:0:0:1 (a.k.a. ::1) is the loopback address. The
address 0.0.0.0 always refers to the originating host, but may only
be used as a source address, not a destination. Similarly, any IPv4
address that begins with 0.0 is assumed to refer to a host on the
same local network.
2.4.2 Network Address Translation
For reasons of both security and address space conservation, many
smaller networks, such as the author's home network,
use network address
translation (NAT). Rather than allotting even
a /28, my ISP gives me a single address, 216.254.85.72. Obviously, that
won't work for the dozen or so different computers
and other devices running in my apartment at any one time. Instead, I
assign each one of them a different address in the non-routable block
192.168.254.xxx. When they
connect to the internet, they have to pass through a router my ISP
sold me that translates the internal addresses into the external
addresses.The router watches my outgoing and incoming connections and adjusts
the addresses in the IP packets. For an outgoing packet, it changes
the source address to the router's external address
(216.254.85.72 on my network).
For an incoming packet, it changes the destination address to one of
the local addresses, such as 192.168.254.12. Exactly how it keeps track of
which connections come from and are aimed at which internal computers
is not particularly important to a Java programmer. As long as your
machines are configured properly, this process is mostly transparent
to Java programs. You just need to remember that the external and
internal addresses may not be the same. From outside my network,
nobody can talk to my system at 192.168.254.12 unless I initiate the
connection, or unless I configure my router to forward requests
addressed to 216.254.85.72 to
192.168.254.12. If the router is
safe, then the rest of the network is too. On the other hand, if
someone does crack the router or one of the servers behind the router
that is mapped to 216.254.85.72,
I'm hosed. This is why I installed a firewall as the
next line of defense.
2.4.3 Firewalls
There are some naughty people on the Internet. To keep them out,
it's often helpful to set up one point of access to
a local network and check all traffic into or out of that access
point. The hardware and software that sit between the Internet and
the local network, checking all the data that comes in or out to make
sure it's kosher, is called a
firewall. The firewall is often part of the
router that connects the local network to the broader Internet and
may perform other tasks, such as network address translation. Then
again, the firewall may be a separate machine. Modern operating
systems like Mac OS X and Red Hat Linux often have built-in personal
firewalls that monitor just the traffic sent to that one machine.
Either way, the firewall is responsible for inspecting each packet
that passes into or out of its network interface and accepting it or
rejecting it according to a set of rules.Filtering is usually based on network addresses and ports. For
example, all traffic coming from the Class C network 193.28.25 may be
rejected because you had bad experiences with hackers from that
network in the past. Outgoing Telnet connections may be allowed, but
incoming Telnet connections may not. Incoming connections on port 80
(web) may be allowed, but only to the corporate web server. More
intelligent firewalls look at the contents of the packets to
determine whether to accept or reject them. The exact configuration
of a firewallwhich packets of data are and are not allowed to
pass throughdepends on the security needs of an individual
site. Java doesn't have much to do with
firewallsexcept in so far as they often get in your way.
2.4.4 Proxy Servers
Proxy servers
are related to firewalls. If a firewall prevents hosts on a network
from making direct connections to the outside world, a proxy server
can act as a go-between. Thus, a machine that is prevented from
connecting to the external network by a firewall would make a request
for a web page from the local proxy server instead of requesting the
web page directly from the remote web server. The proxy server would
then request the page from the web server and forward the response
back to the original requester. Proxies can also be used for FTP
services and other connections. One of the security advantages of
using a proxy server is that external hosts only find out about the
proxy server. They do not learn the names and IP addresses of the
internal machines, making it more difficult to hack into internal
systems.While firewalls generally operate at the level of the transport or
internet layer, proxy servers normally operate at the application
layer. A proxy server has a detailed understanding of some
application level protocols, such as HTTP and FTP. (The notable
exception are SOCKS proxy servers that operate at the transport
layer, and can proxy for all TCP and UDP connections regardless of
application layer protocol.) Packets that pass through the proxy
server can be examined to ensure that they contain data appropriate
for their type. For instance, FTP packets that seem to contain Telnet
data can be rejected. Figure 2-3 shows how proxy
servers fit into the layer model.
Figure 2-3. Layered connections through a proxy server

server, access can be tightly controlled. For instance, a company
might choose to block access to www.playboy.com but allow access to
www.microsoft.com. Some companies
allow incoming FTP but disallow outgoing FTP so confidential data
cannot be as easily smuggled out of the company. Other companies have
begun using proxy servers to track their employees'
web usage so they can see who's using the Internet
to get tech support and who's using it to check out
the Playmate of the Month. Such monitoring of employee behavior is
controversial and not exactly an indicator of enlightened management
techniques.Proxy servers can also be used to implement
local caching. When a file is requested
from a web server, the proxy server first checks to see if the file
is in its cache. If the file is in the cache, the proxy serves the
file from the cache rather than from the Internet. If the file is not
in the cache, the proxy server retrieves the file, forwards it to the
requester, and stores it in the cache for the next time it is
requested. This scheme can significantly reduce load on an Internet
connection and greatly improve response time. America Online runs one
of the largest farm of proxy servers in the world to speed the
transfer of data to its users. If you look at a web server logfile,
you'll probably find some hits from clients in the
Figure 2-4. Standalone Java applications can indicate the
proxy server to use by setting the socksProxyHost
and socksProxyPort properties (if
you're using a SOCKS proxy server), or
http.proxySet, http.proxyHost,
http.proxyPort, https.proxySet,
https.proxyHost,
https.proxyPort, ftpProxySet,
ftpProxyHost, ftpProxyPort,
gopherProxySet,
gopherProxyHost, and
gopherProxyPort system properties (if
you're using protocol-specific proxies). You can set
system properties from the command line using the
-D flag, like this:
java -DsocksProxyHost=You can use any other convenient means to set these system
socks.cloud9.net
-DsocksProxyPort=
1080
MyClass
properties, such as including them in the
appletviewer.properties file, like this:
ftpProxySet=true
ftpProxyHost=ftp.proxy.cloud9.net
ftpProxyPort=1000
gopherProxySet=true
gopherProxyHost=gopher.proxy.cloud9.net
gopherProxyPort=9800
http.proxySet=true
http.proxyHost=web.proxy.cloud9.net
http.proxyPort=8000
https.proxySet=true
https.proxyHost=web.proxy.cloud9.net
https.proxyPort=8001
Figure 2-4. Netscape Navigator proxy server settings
