Java Network Programming (3rd ed) [Electronic resources] نسخه متنی

6.4 Some Useful Programs

You now know everything there is to know about the
java.net.InetAddress class. The tools in this
class alone let you write some genuinely useful programs. Here
we'll look at two examples: one that queries your
domain name server interactively and another that can improve the
performance of your web server by processing log files offline.

6.4.1 HostLookup

nslookup is an old Unix utility that converts
hostnames to IP addresses and IP addresses to hostnames. It has two
modes: interactive and command-line. If you enter a hostname on the
command line, nslookup prints the IP address of
that host. If you enter an IP address on the command line,
nslookup prints the hostname. If no hostname or
IP address is entered on the command line,
nslookup enters interactive mode, in which it
reads hostnames and IP addresses from standard input and echoes back
the corresponding IP addresses and hostnames until you type
"exit". Example 6-11 is a simple character mode application called
HostLookup, which emulates
nslookup. It doesn't implement
any of nslookup's more complex
features, but it does enough to be useful.

Example 6-11. An nslookup clone

import java.net.*;
import java.io.*;
public class HostLookup {
public static void main (String[] args) {
if (args.length > 0) { // use command line
for (int i = 0; i < args.length; i++) {
System.out.println(lookup(args[i]));
}
}
else {
BufferedReader in = new BufferedReader(new InputStreamReader
(System.in));
System.out.println("Enter names and IP addresses. 
Enter \"exit\" to quit.");
try {
while (true) {
String host = in.readLine( );
if (host.equalsIgnoreCase("exit") || 
host.equalsIgnoreCase("quit")) {
break;
}
System.out.println(lookup(host));
}
}
catch (IOException ex) {
System.err.println(ex);
}
}
} /* end main */
private static String lookup(String host) {
InetAddress node;
// get the bytes of the IP address
try {
node = InetAddress.getByName(host);
}
catch (UnknownHostException ex) {
return "Cannot find host " + host;
}
if (isHostname(host)) {
return node.getHostAddress( );
}
else {  // this is an IP address
return node.getHostName( );
}
}  // end lookup
private static boolean isHostname(String host) {
// Is this an IPv6 address?
if (host.indexOf(':') != -1) return false;
char[] ca = host.toCharArray( );
// if we see a character that is neither a digit nor a period
// then host is probably a hostname
for (int i = 0; i < ca.length; i++) {
if (!Character.isDigit(ca[i])) {
if (ca[i] != '.') return true;
}
}
// Everything was either a digit or a period
// so host looks like an IPv4 address in dotted quad format
return false;
}  // end isHostName
} // end HostLookup

Here's some sample output; the input typed by the
user is in bold:

$ java HostLookup utopia.poly.edu
128.238.3.21
$ java HostLookup 128.238.3.21
utopia.poly.edu
$ java HostLookup
Enter names and IP addresses. Enter "exit" to quit.

128.122.80.78
199.1.32.90
star.blackstar.com
localhost
127.0.0.1
stallio.elharo.com
Cannot find host stallio.elharo.com
stallion.elharo.com
127.0.0.1
127.0.0.1
stallion.elharo.com
java.oreilly.com
208.201.239.37
208.201.239.37
www.oreillynet.com
exit
$

There are three methods in the HostLookup
program: main( ), lookup( ),
and isHostName( ). The main( )
method determines whether there are command-line arguments. If there
are command-line arguments, main() calls lookup( ) to
process each one. If there are no command-line arguments,
main( ) chains a BufferedReader
to an InputStreamReader chained to
System.in and reads input from the user with the
readLine( ) method. (The warning about this method
in Chapter 4 doesn't apply
here because the program is reading from the console, not a network
connection.) If the line is "exit",
then the program exits. Otherwise, the line is assumed to be a
hostname or IP address and is passed to the lookup() method.

The lookup( ) method uses
InetAddress.getByName( ) to find the requested
host, regardless of the input's format; remember
that getByName( ) doesn't care if
its argument is a name or a dotted quad address. If
getByName( ) fails, lookup( )
returns a failure message. Otherwise, it gets the address of the
requested system. Then lookup( ) calls
isHostName( ) to determine whether
the input string host is a hostname such as
128.122.153.70,
or a hexadecimal IPv6 address such as FEDC::DC:0:7076:10. isHostName() first looks for colons, which any IPv6 hexadecimal
address will have and no hostname will have. If it finds any, it
returns false. Checking for IPv4 addresses is a little trickier
because dotted quad addresses don't contain any
character that can't appear in a hostname. Instead,
isHostName( ) looks at each character of the
string; if all the characters are digits or periods,
isHostName( ) guesses that the string is a numeric
IP address and returns false. Otherwise, isHostName( ) guesses that the string is a hostname and returns true.
What if the string is neither? Such an eventuality is very unlikely:
if the string is neither a hostname nor an address,
getByName( ) won't be able to do
a lookup and will throw an exception. However, it would not be
difficult to add a test making sure that the string looks valid; this
is left as an exercise for the reader. If the user types a hostname,
lookup( ) returns the corresponding dotted quad or
hexadecimal address using getHostAddress( ). If
the user types an IP address, then we use the getHostName() method to look up the hostname corresponding to the
address, and return it.

6.4.2 Processing Web Server Log Files

Web server logs track the hosts that
access a web site. By default, the log reports the IP addresses of
the sites that connect to the server. However, you can often get more
information from the names of those sites than from their IP
addresses. Most web servers have an option to store hostnames instead
of IP addresses, but this can hurt performance because the server
needs to make a DNS request for each hit. It is much more efficient
to log the IP addresses and convert them to hostnames at a later
time, when the server isn't busy or even on another
machine completely. Example 6-12 is a program called
Weblog that reads a web server log file and prints
each line with IP addresses converted to hostnames.

Most web servers have standardized on the common log file format,
although there are exceptions; if your web server is one of those
exceptions, you'll have to modify this program. A
typical line in the common log file format looks like this:

205.160.186.76 unknown - [17/Jun/2003:22:53:58 -0500] 
"GET /bgs/greenbg.gif HTTP 1. 0" 200 50

This line indicates that a web browser at IP address 205.160.186.76
requested the file /bgs/greenbg.gif from this
web server at 11:53 p.m. (and 58 seconds) on June 17, 2003. The file
was found (response code 200) and 50 bytes of data were successfully
transferred to the browser.

The first field is the IP address or, if DNS resolution is turned on,
the hostname from which the connection was made. This is followed by
a space. Therefore, for our purposes, parsing the log file is easy:
everything before the first space is the IP address, and everything
after it does not need to be changed.

The Common Log File Format

If
you want to expand Weblog into a more general web
server log processor, you need a little more information about the
common log file format. A line in the file has the format:

remotehost rfc931 authuser [date] "request" status bytes

remotehost

remotehost is either the hostname or IP address
from which the browser connected.

rfc931

rfc931 is the username of the user on the remote system, as specified
by Internet protocol RFC 931. Very few browsers send this
information, so it's almost always either unknown or
a dash. This is followed by a space.

authuser

authuser is the authenticated username as
specified by RFC 931. Once again, most popular browsers or client
systems do not support this; this field usually is filled in with a
dash, followed by a space.

[date]

The date and time of the request are given in brackets. This is the
local system time when the request was made. Days are a two-digit
number ranging from 01 to 31. The month is Jan, Feb, Mar, Apr, May,
Jun, Jul, Aug, Sep, Oct, Nov, or Dec. The year is indicated by four
digits. The year is followed by a colon, the hour (from 00 to 23),
another colon, two digits signifying the minute (00 to 59), a colon,
and two digits signifying the seconds (00 to 59). Then comes the
closing bracket and another space.

"request"

The request line exactly as it came from the client. It is enclosed
in quotation marks because it may contain embedded spaces. It is not
guaranteed to be a valid HTTP request since client software may
misbehave.

status

A numeric HTTP status code returned to the client. A list of HTTP 1.0
status codes is given in Chapter 3. The most
common response is 200, which means the request was successfully
processed.

bytes

The number of bytes of data that was sent to the client as a result
of this request.

The dotted quad format IP address is converted into a hostname using
the usual methods of java.net.InetAddress. Example 6-12 shows the code.

Example 6-12. Process web server log files

import java.net.*;
import java.io.*;
import java.util.*;
import com.macfaq.io.SafeBufferedReader; 
public class Weblog {
public static void main(String[] args) {
Date start = new Date( );
try {
FileInputStream fin =  new FileInputStream(args[0]);
Reader in = new InputStreamReader(fin);
SafeBufferedReader bin = new SafeBufferedReader(in);
String entry = null;
while ((entry = bin.readLine( )) != null) {
// separate out the IP address
int index = entry.indexOf(' ', 0);
String ip = entry.substring(0, index);
String theRest = entry.substring(index, entry.length( ));
// find the hostname and print it out
try {
InetAddress address = InetAddress.getByName(ip);
System.out.println(address.getHostName( ) + theRest);
}
catch (UnknownHostException ex) {
System.out.println(entry);
}
} // end while
}
catch (IOException ex) {
System.out.println("Exception: " + ex);
}
Date end = new Date( );
long elapsedTime = (end.getTime( )-start.getTime( ))/1000;
System.out.println("Elapsed time: " + elapsedTime + " seconds");
}  // end main
}

The name of the file to be processed is passed to
Weblog as the first argument on the command line.
A FileInputStream fin is opened
from this file and an InputStreamReader is chained
to fin. This InputStreamReader
is buffered by chaining it to an instance of the
SafeBufferedReader class developed in Chapter 4. The file is processed line by line in a
while loop.

Each pass through the loop places one line in the
String variable entry.
entry is then split into two substrings:
ip, which contains everything before the first
space, and theRest, which is everything after the
first space. The position of the first space is determined by
entry.indexOf(" ",
0). ip is converted to an
InetAddress object using getByName(). getHostName( ) then looks up the
hostname. Finally, the hostname, a space, and everything else on the
line (theRest) are printed on
System.out. Output can be sent to a new file
through the standard means for redirecting output.

Weblog is more efficient than you might expect.
Most web browsers generate multiple log file entries per page served,
since there's an entry in the log not just for the
page itself but for each graphic on the page. And many visitors
request multiple pages while visiting a site. DNS lookups are
expensive and it simply doesn't make sense to look
up each site every time it appears in the log file. The
InetAddress class caches requested addresses. If
the same address is requested again, it can be retrieved from the
cache much more quickly than from DNS.

Nonetheless, this program could certainly be faster. In my initial
tests, it took more than a second per log entry. (Exact numbers
depend on the speed of your network connection, the speed of the
local and remote DNS servers, and network congestion when the program
is run.) The program spends a huge amount of time sitting and waiting
for DNS requests to return. Of course, this is exactly the problem
multithreading is designed to solve. One main thread can read the log
file and pass off individual entries to other threads for processing.

A
thread pool is absolutely necessary here. Over the space of a few
days, even low-volume web servers can easily generate a log file with
hundreds of thousands of lines. Trying to process such a log file by
spawning a new thread for each entry would rapidly bring even the
strongest virtual machine to its knees, especially since the main
thread can read log file entries much faster than individual threads
can resolve domain names and die. Consequently, reusing threads is
essential. The number of threads is stored in a tunable parameter,
numberOfThreads, so that it can be adjusted to fit
the VM and network stack. (Launching too many simultaneous DNS
requests can also cause problems.)

This program is now divided into two classes. The first class,
PooledWeblog, shown in Example 6-13, contains the
main( ) method and the processLogFile( ) method. It also holds the resources that need to be
shared among the threads. These are the pool, implemented as a
synchronized LinkedList from the Java Collections
API, and the output log, implemented as a
BufferedWriter named out.
Individual threads have direct access to the pool but have to pass
through PooledWeblog's
log( ) method to write output.

The key method is processLogFile(). As before, this method reads from the
underlying log file. However, each entry is placed in the
entries pool rather than being immediately
processed. Because this method is likely to run much more quickly
than the threads that have to access DNS, it yields after reading
each entry. Furthermore, it goes to sleep if there are more entries
in the pool than threads available to process them. The amount of
time it sleeps depends on the number of threads. This setup avoids
using excessive amounts of memory for very large log files. When the
last entry is read, the finished flag is set to
true to tell the threads that they can die once
they've completed their work.

Example 6-13. PooledWebLog

import java.io.*;
import java.util.*;
import com.macfaq.io.SafeBufferedReader;
public class PooledWeblog {
private BufferedReader in;
private BufferedWriter out;
private int numberOfThreads;
private List entries = Collections.synchronizedList(new LinkedList( ));
private boolean finished = false;
private int test = 0;
public PooledWeblog(InputStream in, OutputStream out, 
int numberOfThreads) {
this.in = new BufferedReader(new InputStreamReader(in));
this.out = new BufferedWriter(new OutputStreamWriter(out));
this.numberOfThreads = numberOfThreads;
}
public boolean isFinished( ) {
return this.finished; 
}
public int getNumberOfThreads( ) {
return numberOfThreads; 
}
public void processLogFile( ) {
for (int i = 0; i < numberOfThreads; i++) {
Thread t = new LookupThread(entries, this);
t.start( );
}
try {
String entry = in.readLine( );
while (entry != null) {
if (entries.size( ) > numberOfThreads) {
try {
Thread.sleep((long) (1000.0/numberOfThreads));
}
catch (InterruptedException ex) {}
continue;
}
synchronized (entries) {
entries.add(0, entry);
entries.notifyAll( ); 
}
entry = in.readLine( );
Thread.yield( );
} // end while
}
public void log(String entry) throws IOException {
out.write(entry + System.getProperty("line.separator", "\r\n"));
out.flush( );
}
public static void main(String[] args) {
try {
PooledWeblog tw = new PooledWeblog(new FileInputStream(args[0]), 
System.out, 100);
tw.processLogFile( );
}
catch (FileNotFoundException ex) {
System.err.println("Usage: java PooledWeblog logfile_name");
}
catch (ArrayIndexOutOfBoundsException ex) {
System.err.println("Usage: java PooledWeblog logfile_name");
}
catch (Exception ex) {
System.err.println(ex);
e.printStackTrace( );
}
}  // end main
}

The LookupThread class, shown in Example 6-14, handles the detailed work of converting IP
addresses to hostnames in the log entries. The constructor provides
each thread with a reference to the entries pool
it will retrieve work from and a reference to the
PooledWeblog object it's working
for. The latter reference allows callbacks to the
PooledWeblog so that the thread can log converted
entries and check to see when the last entry has been processed. It
does so by calling the isFinished( ) method in
PooledWeblog when the entries
pool is empty (i.e., has size 0). Neither an empty pool nor
isFinished( ) returning true is sufficient by
itself. isFinished( ) returns true after the last
entry is placed in the pool, which occurs, at least for a small
amount of time, before the last entry is removed from the pool. And
entries may be empty while there are still many
entries remaining to be read if the lookup threads outrun the main
thread reading the log file.

Example 6-14. LookupThread

import java.net.*; 
import java.io.*;
import java.util.*;
public class LookupThread extends Thread {
private List entries;
PooledWeblog log;   // used for callbacks
public LookupThread(List entries, PooledWeblog log) {
this.entries = entries;
this.log = log;
}
public void run( ) {
String entry;
while (true) {
synchronized (entries) {
while (entries.size( ) == 0) {
if (log.isFinished( )) return;
try {
entries.wait( );
}
catch (InterruptedException ex) {
}
}       
entry = (String) entries.remove(entries.size( )-1);
}
int index = entry.indexOf(' ', 0);
String remoteHost = entry.substring(0, index);
String theRest = entry.substring(index, entry.length( ));
try {
remoteHost = InetAddress.getByName(remoteHost).getHostName( );
}
catch (Exception ex) {
// remoteHost remains in dotted quad format
}
try {
log.log(remoteHost + theRest);
}
catch (IOException ex) {
} 
this.yield( );
}
}
}

Using threads like this lets the same
log files be processed in parallela huge time-savings. In my
unscientific tests, the threaded version is 10 to 50 times faster
than the sequential version.

The biggest disadvantage to the multithreaded approach is that it
reorders the log file. The output statistics aren't
necessarily in the same order as the input statistics. For simple hit
counting, this doesn't matter. However, there are
some log analysis tools that can mine a log file to determine paths
users followed through a site. These tools could get confused if the
log is out of sequence. If the log sequence is an issue, attach a
sequence number to each log entry. As the individual threads return
log entries to the main program, the log( ) method
in the main program stores any that arrive out of order until their
predecessors appear. This is in some ways reminiscent of how network
software reorders TCP packets that arrive out of order.

Java Network Programming (3rd ed) [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی