7.5 Communicating with Server-Side Programs Through GET
The URL class makes it easy for Java applets and
applications to communicate with server-side programs such as CGIs,
servlets, PHP pages, and others that use the GET
method. (Server-side programs that use the POST
method require the URLConnection class and are
discussed in Chapter 15.) All you need to know
is what combination of names and values the program expects to
receive, and cook up a URL with a query string that provides the
requisite names and values. All names and values must be
x-www-form-url-encodedas by the URLEncoder.encode() method, discussed earlier in this chapter.There are a number of ways to determine the exact syntax for a query
string that talks to a particular program. If you've
written the server-side program yourself, you already know the
name-value pairs it expects. If you've installed a
third-party program on your own server, the documentation for that
program should tell you what it expects.On the other hand, if you're talking to a program on
a third-party server, matters are a little trickier. You can always
ask people at the remote server to provide you with the
specifications for talking to their site. However, even if they
don't mind doing this, there's
probably no single person whose job description includes
"telling third-party hackers with whom we have no
business relationship exactly how to access our
servers." Thus, unless you happen upon a
particularly friendly or bored individual who has nothing better to
do with their time except write long emails detailing exactly how to
access their server, you're going to have to do a
little reverse engineering.
|
form input. If this is the case, it's
straightforward to figure out what input the program expects. The
method the form uses should be the value of the
METHOD attribute of the FORM
element. This value should be either GET, in which
case you use the process described here, or POST,
in which case you use the process described in Chapter 15. The part of the URL that precedes the
query string is given by the value of the ACTION
attribute of the FORM element. Note that this may
be a relative URL, in which case you'll need to
determine the corresponding absolute URL. Finally, the name-value
pairs are simply the NAME attributes of the
INPUT elements, except for any
INPUT elements whose TYPE
attribute has the value submit.For example, consider this HTML form for the local search engine on
my Cafe con Leche site. You can see that it uses the
GET method. The program that processes the form is
accessed via the URL http://www.google.com/search. It has four
separate name-value pairs, three of which have default values:
<form name="search" action="http://www.google.com/search" method="get">The type of the INPUT field
<input name="q" />
<input type="hidden" value="cafeconleche.org" name="domains" />
<input type="hidden" name="sitesearch" value="cafeconleche.org" />
<input type="hidden" name="sitesearch2" value="cafeconleche.org" />
<br />
<input type="image" height="22" width="55"
src="/image/library/english/10151_search_blue.gif" border="0"
name="search-image" />
</form>
doesn't matterfor instance, it
doesn't matter if it's a set of
checkboxes, a pop-up list, or a text fieldonly the name of
each INPUT field and the value you give it is
significant. The single exception is a submit input that tells the
web browser when to send the data but does not give the server any
extra information. In some cases, you may find hidden
INPUT fields that must have particular required
default values. This form has three hidden INPUT
fields.In some cases, the program you're talking to may not
be able to handle arbitrary text strings for values of particular
inputs. However, since the form is meant to be read and filled in by
human beings, it should provide sufficient clues to figure out what
input is expected; for instance, that a particular field is supposed
to be a two-letter state abbreviation or a phone number.A program that doesn't respond to a form is much
harder to reverse engineer. For example, at http://www.ibiblio.org/nywc/bios.phtml,
you'll find a lot of links to PHP pages that talk to
a database to retrieve a list of musical works by a particular
composer. However, there's no form anywhere that
corresponds to this program. It's all done by
hardcoded URLs. In this case, the best you can do is look at as many
of those URLs as possible and see whether you can guess what the
server expects. If the designer hasn't tried to be
too devious, this information isn't hard to figure
out. For example, these URLs are all found on that page:
http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=AndersonLooking at these, you can guess that this particular program expects
&first=Beth&middle=
http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=Austin
&first=Dorothea&middle=
http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=Bliss
&first=Marilyn&middle=
http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=Hart
&first=Jane&middle=Smith
three inputs named first, middle, and last, with values that consist
of the first, middle, and last names of a composer, respectively.
Sometimes the inputs may not have such obvious names. In this case,
you have to do some experimenting, first copying some existing values
and then tweaking them to see what values are and
aren't accepted. You don't need to
do this in a Java program. You can simply edit the URL in the Address
or Location bar of your web browser window.
|
server expects, communicating with it once you know them is simple.
All you have to do is create a query string that includes the
necessary name-value pairs, then form a URL that includes that query
string. Send the query string to the server and read its response
using the same methods you use to connect to a server and retrieve a
static HTML page. There's no special protocol to
follow once the URL is constructed. (There is a special protocol to
follow for the POST method, however, which is why
discussion of that method will have to wait until Chapter 15.)To demonstrate this procedure, let's write a very
simple command-line program to look up topics in the
Netscape Open Directory (http://dmoz.org/). This site is shown in Figure 7-3 and it has the advantage of being really simple.
Figure 7-3. The basic user interface for the Open Directory

form with one input field named search; input
typed in this field is sent to a CGI program at http://search.dmoz.org/cgi-bin/search, which
does the actual search. The HTML for the form looks like this:
<form accept-charset="UTF-8"There are only two input fields in this form: the Submit button and a
action="http://search.dmoz.org/cgi-bin/search" method="GET">
<input size=30 name=search>
<input type=submit value="Search">
<a href="http://search.dmoz.org/cgi-bin/search?a.x=0">
<small><i>advanced</i></small></a>
</form>
text field named Search. Thus, to submit a search request to the Open
Directory, you just need to collect the search string, encode it in a
query string, and send it to http://search.dmoz.org/cgi-bin/search. For
example, to search for "java", you
would open a connection to the URL http://search.dmoz.org/cgi-bin/search?search=java
and read the resulting input stream. Example 7-12
does exactly this.
Example 7-12. Do an Open Directory search
import com.macfaq.net.*;Of course, a lot more effort could be expended on parsing and
import java.net.*;
import java.io.*;
public class DMoz {
public static void main(String[] args) {
String target = ";
for (int i = 0; i < args.length; i++) {
target += args[i] + " ";
}
target = target.trim( );
QueryString query = new QueryString("search", target);
try {
URL u = new URL("http://search.dmoz.org/cgi-bin/search?" + query);
InputStream in = new BufferedInputStream(u.openStream( ));
InputStreamReader theHTML = new InputStreamReader(in);
int c;
while ((c = theHTML.read( )) != -1) {
System.out.print((char) c);
}
}
catch (MalformedURLException ex) {
System.err.println(ex);
}
catch (IOException ex) {
System.err.println(ex);
}
}
}
displaying the results. But notice how simple the code was to talk to
this server. Aside from the funky-looking URL and the slightly
greater likelihood that some pieces of it need to be
x-www-form-url-encoded, talking to a server-side program that uses
GET is no harder than retrieving any other HTML
page.