Java Network Programming (3rd ed) [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Java Network Programming (3rd ed) [Electronic resources] - نسخه متنی

Harold, Elliotte Rusty

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








17.2 The ContentHandler Class


A subclass of
ContentHandler overrides the getContent() method to return an object that's the
Java equivalent of the content. This method can be quite simple or
quite complex, depending almost entirely on the complexity of the
content type you're trying to parse. A
text/plain content handler is quite simple; a
text/rtf content handler would be very complex.

The
ContentHandler class
has only a simple noargs constructor:

public ContentHandler( )

Since ContentHandler is an abstract class, you
never call its constructor directly, only from inside the
constructors of subclasses.

The primary method of the class, albeit an abstract one, is
getContent( ):

public abstract Object getContent(URLConnection uc) throws IOException

This method is normally called only from inside the
getContent( ) method of a
URLConnection object. It is overridden in a
subclass that is specific to the type of content being handled.
getContent( ) should use the
URLConnection's
InputStream to create an object. There are no
rules about what type of object a content handler should return. In
general, this depends on what the application requesting the content
expects. Content handlers for text-like content bundled with the JDK
return some subclass of InputStream. Content
handlers for images return ImageProducer objects.

The getContent( ) method of a content handler does
not get the full InputStream that the
URLConnection has access to. The
InputStream that a content handler sees should
include only the content's raw data. Any MIME
headers or other protocol-specific information that come from the
server should be stripped by the URLConnection
before it passes the stream to the ContentHandler.
A ContentHandler is only responsible for content,
not for any protocol overhead that may be present. The
URLConnection should have already performed any
necessary handshaking with the server and interpreted any headers it
sends.


17.2.1 A Content Handler for Tab-Separated Values


To see how content handlers work,
let's create a ContentHandler
that handles the text/tab-separated-values content
type. We aren't concerned with how the tab-separated
values get to us. That's for a protocol handler to
deal with. All a ContentHandler needs to know is
the MIME type and format of the data.

Tab-separated values are produced by many
database and spreadsheet programs. A tab-separated file may look
something like this (tabs are indicated by arrows).

JPE Associates  341 Lafayette Street, Suite 1025 
New York NY 10012
O'Reilly & Associates 103 Morris Street, Suite A
Sebastopol CA 95472

In database parlance, each line is a
record,
and the data before each tab is a
field.
It is usually (though not necessarily) true that each field has the
same meaning in each record. In the previous example, the first field
is the company name.

The first question to ask is: what kind of Java object should we
convert the tab- separated values to? The simplest and most general
way to store each record is as an array of
Strings. Successive records can be collected in a
Vector. In many applications, however, you have a
great deal more knowledge about the exact format and meaning of the
data than we do here. The more you know about the data
you're dealing with, the better a
ContentHandler you can write. For example, if you know
that the data you're downloading represents U.S.
addresses, you could define a class like this:

public class Address {
private String name;
private String street;
private String city;
private String state;
private String zip;
}

This class would also have appropriate constructors and other methods
to represent each record. In this example, we don't
know anything about the data in advance, or how many records
we'll have to store. Therefore, we will take the
most general approach and convert each record into an array of
strings, using a Vector to store each array until
there are no more records. The getContent( )
method can return the Vector of
String arrays.

Example 17-1 shows the code for such a
ContentHandler. The full package-qualified name is
com.macfaq.net.www.content.text.tab_separated_values.
This unusual class name follows the naming convention for a content
handler for the MIME type
text/tab-separated-values. Since MIME types often
contain hyphens, as in this example, a convention exists to replace
these with the underscore (_). Thus
text/tab-separated-values becomes
text.tab_separated_values. To install this content
handler, all that's needed is to put the compiled
.class file somewhere the class loader can find
it and set the java.content.handler.pkgs property
to com.macfaq.net.www.content.


Example 17-1. A ContentHandler for text/tab-separated-values


package com.macfaq.net.www.content.text;
import java.net.*;
import java.io.*;
import java.util.*;
import com.macfaq.io.SafeBufferedReader // From Chapter 4
public class tab_separated_values extends ContentHandler {
public Object getContent(URLConnection uc) throws IOException {
String theLine;
Vector lines = new Vector( );
InputStreamReader isr = new InputStreamReader(uc.getInputStream( ));
SafeBufferedReader in = new SafeBufferedReader(isr);
while ((theLine = in.readLine( )) != null) {
String[] linearray = lineToArray(theLine);
lines.addElement(linearray);
}
return lines;
}
private String[] lineToArray(String line) {
int numFields = 1;
for (int i = 0; i < line.length( ); i++) {
if (line.charAt(i) == '\t') numFields++;
}
String[] fields = new String[numFields];
int position = 0;
for (int i = 0; i < numFields; i++) {
StringBuffer buffer = new StringBuffer( );
while (position < line.length( ) && line.charAt(position) != '\t') {
buffer.append(line.charAt(position));
position++;
}
fields[i] = buffer.toString( );
position++;
}
return fields;
}
}

Example 17-1 has two methods. The
private utility method
lineToArray( ) converts a tab-separated string
into an array of strings. This method is for the private use of this
subclass and is not required by the ContentHandler
interface. The more complicated the content you're
trying to parse, the more such methods your class will need. The
lineToArray( ) method begins by counting the
number of tabs in the string. This sets the
numFields variable to one more than the number of
tabs. An array is created for the fields with the length
numFields; a for loop fills the
array with the strings between the tabs; and this array is returned.


You may have expected a StringTokenizer to split
the line into parts. However, that class has unusual ideas about what
makes up a token. In particular, it interprets multiple tabs in a row
as a single delimiter. That is, it never returns an empty string as a
token.

The getContent( ) method starts by instantiating a
Vector. Then it gets the
InputStream from the
URLConnection uc and chains
this to an InputStreamReader, which is in turn
chained to the SafeBufferedReader (introduced in
Chapter 4) so getContent( )
can read the array one line at a time in a while
loop. Each line is fed to the lineToArray( )
method, which splits it into a String array. This
array is then added to the Vector. When no more
lines are left, the loop exits and the Vector is
returned.


17.2.2 Using Content Handlers


Now that
you've written your first
ContentHandler, let's see how to
use it in a program. Files of MIME type
text/tab-separated-values can be served by gopher
servers, HTTP servers, FTP servers, and more. Let's
assume you're retrieving a tab-separated-values file
from an HTTP server. The filename should end with the
.tsv or .tab extension so
that the server knows it's a
text/tab-separated-values file.


Not all servers are configured to support this type out of the box.
Consult your server documentation to see how to set up a MIME-type
mapping for your server. For instance, to configure my Apache server,
I added these lines to my .htaccess file:

AddType text/tab-separated-values tab
AddType text/tab-separated-values tsv

You can test the web server configuration by connecting to port 80 of
the web server with Telnet and requesting the file manually:

% telnet www.ibiblio.org 80
Trying 127.0.0.1...
Connected to www.ibiblio.org.
Escape character is '^]'.
GET /javafaq/addresses.tab HTTP 1.0
HTTP 1.0 200 OK
Date: Mon, 15 Nov 1999 18:36:51 GMT
Server: Apache/1.3.4 (Unix) PHP/3.0.6 mod_perl/1.17
Last-Modified: Thu, 04 Nov 1999 18:22:51 GMT
Content-type: text/tab-separated-values
Content-length: 163
JPE Associates 341 Lafayette Street, Suite 1025 New York NY 10012
O'Reilly & Associates 103 Morris Street, Suite A Sebastopol CA 95472
Connection closed by foreign host.

You're looking for a line that says
Content-type:
text/tab-separated-values. If you see a
Content-type of text/plain,
application/octet-stream, or some other value, or
you don't see any Content-type at
all, the server is misconfigured and must be fixed before you
continue.

The application that uses the tab-separated-values content handler
does not need to know about it explicitly. It simply has to call the
getContent( ) method of URL or
URLConnection on a URL with a matching MIME type.
Furthermore, the package where the content handler can be found has
to be listed in the java.content.handlers.pkg
property.

Example 17-2 is a class that downloads and prints a
text/tab-separated-values file using the
ContentHandler of Example 17-1.
However, note that it does not import
com.macfaq.net.www.content.text and never
references the tab_separated_values class. It does
explicitly add com.macfaq.net.www.content to the
java.content.handlers.pkgs property because
that's the simplest way to make sure this standalone
program works. However, the lines that do this could be deleted if
the property were set in a property file or from the command line.


Example 17-2. The tab-separated-values ContentTester class


import java.io.*;
import java.net.*;
import java.util.*;
public class TSVContentTester {
private static void test(URL u) throws IOException {
Object content = u.getContent( );
Vector v = (Vector) content;
for (Enumeration e = v.elements( ) ; e.hasMoreElements( ) ;) {
String[] sa = (String[]) e.nextElement( );
for (int i = 0; i < sa.length; i++) {
System.out.print(sa[i] + "\t");
}
System.out.println( );
}
}
public static void main (String[] args) {
// If you uncomment these lines, then you don't have to
// set the java.content.handler.pkgs property from the
// command line or your properties files.
/* String pkgs = System.getProperty("java.content.handler.pkgs", ");
if (!pkgs.equals(")) {
pkgs = pkgs + "|";
}
pkgs += "com.macfaq.net.www.content";
System.setProperty("java.content.handler.pkgs", pkgs); */
for (int i = 0; i < args.length; i++) {
try {
URL u = new URL(args[i]);
test(u);
}
catch (MalformedURLException ex) {
System.err.println(args[i] + " is not a good URL");
}
catch (Exception ex) {
ex.printStackTrace( );
}
}
}
}

Here's how you run this program. The arrows indicate
tabs:

% java -Djava.content.handler.pkgs=com.macfaq.net.www.content\ 
TSVContentTester http://www.ibiblio.org/javafaq/addresses.tab
JPE Associates 341 Lafayette Street, Suite 1025 New York
NY 10012
O'Reilly & Associates 103 Morris Street, Suite A
Sebastopol CA 95472


17.2.3 Choosing Return Types


There is one overloaded variant of
the getContent( )
method in the ContentHandler class:

public Object getContent(URLConnection uc, Class[] classes) // Java 1.3
throws IOException

The difference is the array of java.lang.Class
objects passed as the second argument. This allows the caller to
request that the content be returned as one of the types in the array
and enables content handlers to support multiple types. For example,
the text/tab-separated-values content handler
could return data as a Vector, an array, a string,
or an InputStream. One would be the default used
by the single argument getContent( ) method, while
the others would be options that a client could request. If the
client doesn't request any of the classes this
ContentHandler knows how to provide, it returns
null.

To call this method, the client invokes the method with the same
arguments in a URL or
URLConnection object. It passes an array of
Class objects in the order it wishes to receive
the data. Thus, if it prefers to receive a String
but is willing to accept an InputStream and will
take a Vector as a last resort, it puts
String.class in the zeroth component of the array,
InputStream.class in the first component of the
array, and Vector.class in the last component of
the array. Then it uses instanceof to test what
was actually returned and either process it or convert it into the
preferred type. For example:

Class[] requestedTypes = {String.class, InputStream.class, 
Vector.class};
Object content = url.getContent(requestedTypes);
if (content instanceof String) {
String s = (String) content;
System.out.println(s);
}
else if (content instanceof InputStream) {
InputStream in = (InputStream) content;
int c;
while ((c = in.read( )) != -1) System.out.write(c);
}
else if (content instanceof Vector) {
Vector v = (Vector) content;
for (Enumeration e = v.elements( ) ; e.hasMoreElements( ) ;) {
String[] sa = (String[]) e.nextElement( );
for (int i = 0; i < sa.length; i++) {
System.out.print(sa[i] + "\t");
}
System.out.println( );
}
}
else {
System.out.println("Unrecognized content type " + content.getClass( ));
}

To demonstrate this, let's write a content handler
that can be used in association with the time protocol. Recall that
the time protocol returns the current time at the server as a 4-byte,
big-endian, unsigned integer giving the number of seconds since
midnight, January 1, 1900, Greenwich Mean Time. There are several
obvious candidates for storing this data in a Java content handler,
including java.lang.Long
(java.lang.Integer won't work
since the unsigned value may overflow the bounds of an
int), java.util.Date,
java.util.Calendar,
java.lang.String, and
java.io.InputStream, which often works as a last
resort. Example 17-3 provides all five options.
There's no standard MIME type for the time format.
We'll use application for the
type to indicate that this is binary data and
x-time for the subtype to indicate that this is a
nonstandard extension type. It will be up to the time protocol
handler to return the right content type.


Example 17-3. A time content handler


package com.macfaq.net.www.content.application;
import java.net.*;
import java.io.*;
import java.util.*;
public class x_time extends ContentHandler {
public Object getContent(URLConnection uc) throws IOException {
Class[] classes = new Class[1];
classes[0] = Date.class;
return this.getContent(uc, classes);
}
public Object getContent(URLConnection uc, Class[] classes)
throws IOException {
InputStream in = uc.getInputStream( );
for (int i = 0; i < classes.length; i++) {
if (classes[i] == InputStream.class) {
return in;
}
else if (classes[i] == Long.class) {
long secondsSince1900 = readSecondsSince1900(in);
return new Long(secondsSince1900);
}
else if (classes[i] == Date.class) {
long secondsSince1900 = readSecondsSince1900(in);
Date time = shiftEpochs(secondsSince1900);
return time;
}
else if (classes[i] == Calendar.class) {
long secondsSince1900 = readSecondsSince1900(in);
Date time = shiftEpochs(secondsSince1900);
Calendar c = Calendar.getInstance( );
c.setTime(time);
return c;
}
else if (classes[i] == String.class) {
long secondsSince1900 = readSecondsSince1900(in);
Date time = shiftEpochs(secondsSince1900);
return time.toString( );
}
}
return null; // no requested type available
}
private long readSecondsSince1900(InputStream in)
throws IOException {
long secondsSince1900 = 0;
for (int j = 0; j < 4; j++) {
secondsSince1900 = (secondsSince1900 << 8) | in.read( );
}
return secondsSince1900;
}
private Date shiftEpochs(long secondsSince1900) {
// The time protocol sets the epoch at 1900, the Java Date class
// at 1970. This number converts between them.
long differenceBetweenEpochs = 2208988800L;
long secondsSince1970 = secondsSince1900 - differenceBetweenEpochs;
long msSince1970 = secondsSince1970 * 1000;
Date time = new Date(msSince1970);
return time;
}
}

Most of the work is performed by the second getContent() method, which checks to see whether it recognizes any of
the classes in the classes array. If so, it
attempts to convert the content into an object of that type. The
for loop is arranged so that classes earlier in
the array take precedence; that is, it first tries to match the first
class in the array; next it tries to match the second class in the
array; then the third class in the array; and so on. As soon as one
class is matched, the method returns so later classes
won't be matched even if they're an
allowed choice.

Once a type is matched, a simple algorithm converts the four bytes
that the time server sends into the right kind of object, either an
InputStream, a Long, a
Date, a Calendar, or a
String. The InputStream
conversion is trivial. The Long conversion is one
of those times when it seems a little inconvenient that primitive
data types aren't objects. Although you can convert
to and return any object type, you can't convert to
and return a primitive data type like long, so we
return the type wrapper class Long instead. The
Date and Calendar conversions
require shifting the origin of the time from January 1, 1900 to
January 1, 1970 and changing the units from seconds to milliseconds,
as discussed in Chapter 9. Finally, the
conversion to a String simply converts to a
Date and then invokes the Date
object's toString( ) method.

While it would be possible to configure a web server to send data of
MIME type application/x-time, this class is really
designed to be used by a custom protocol handler. This handler would
know not only how to speak the time protocol, but also how to return
application/x-time from the
getContentType( ) method. Example 17-4 and Example 17-5 demonstrate
such a protocol handler. It assumes that time URLs look like
time://vision.poly.edu:3737/.


Example 17-4. The URLConnection for the time protocol handler


package com.macfaq.net.www.protocol.time;
import java.net.*;
import java.io.*;
import com.macfaq.net.www.content.application.*;
public class TimeURLConnection extends URLConnection {
private Socket connection = null;
public final static int DEFAULT_PORT = 37;
public TimeURLConnection (URL u) {
super(u);
}
public String getContentType( ) {
return "application/x-time";
}
public Object getContent( ) throws IOException {
ContentHandler ch = new x_time( );
return ch.getContent(this);
}
public Object getContent(Class[] classes) throws IOException {
ContentHandler ch = new x_time( );
return ch.getContent(this, classes);
}
public InputStream getInputStream( ) throws IOException {
if (!connected) this.connect( );
return this.connection.getInputStream( );
}
public synchronized void connect( ) throws IOException {
if (!connected) {
int port = url.getPort( );
if ( port < 0) {
port = DEFAULT_PORT;
}
this.connection = new Socket(url.getHost( ), port);
this.connected = true;
}
}
}

In general, it should be enough for the protocol handler to simply
know or be able to deduce the correct MIME content type. However, in
a case like this, where both content and protocol handlers must be
provided, you can tie them a little more closely together by
overriding getContent( ) as well. This allows you
to avoid messing with the
java.content.handler.pkgs property or installing a
ContentHandlerFactory. You will still need to set
the java.protocolhandler.pkgs property to point to
your package or install a URLStreamHandlerFactory,
however. Example 17-5 is a simple URLStreamHandler
for the time protocol handler.


Example 17-5. The URLStreamHandler for the time protocol handler


package com.macfaq.net.www.protocol.time;
import java.net.*;
import java.io.*;
public class Handler extends URLStreamHandler {
protected URLConnection openConnection(URL u) throws IOException {
return new TimeURLConnection(u);
}
}

We could install the time protocol handler into HotJava as we did
with protocol handlers in the previous chapter. However, even if we
place the time content handler in HotJava's class
path, HotJava won't use it. Consequently,
I've written a simple standalone application, shown
in Example 17-6, that uses these protocol and content
handlers to tell the time. Notice that it does not need to import or
directly refer to any of the classes involved. It simply lets the
URL find the right content handler.


Example 17-6. URLTimeClient


import java.net.*;
import java.util.*;
import java.io.*;
public class URLTimeClient {
public static void main(String[] args) {
System.setProperty("java.protocol.handler.pkgs",
"com.macfaq.net.www.protocol");
try {
// You can replace this with your own time server
URL u = new URL("time://tock.usno.navy.mil/");
Class[] types = {String.class, Date.class,
Calendar.class, Long.class};
Object o = u.getContent(types);
System.out.println(o);
}
catch (IOException ex) {
// Let's see what went wrong
ex.printStackTrace( );
}
}
}

Here's a sample run:

D:\JAVA\JNP3\examples\17>java URLTimeClient
Mon Aug 23 21:30:34 EDT 2004

In this case, a String object was returned. This
was the first choice of URLTimeClient but the last
choice of the content handler. The client choice always takes
precedence.


/ 164