3.3 HTTP
HTTP is the
standard protocol for communication between web browsers and web
servers. HTTP specifies how a client and server establish a
connection, how the client requests data from the server, how the
server responds to that request, and finally, how the connection is
closed. HTTP connections use the TCP/IP protocol for data transfer.
For each request from client to server, there is a sequence of four
steps:
The client establishes a TCP connection to the server on port 80, by
default; other ports may be specified in the URL.
The client sends a message to the server requesting the page at a
specified URL. The format of this request is typically something
like:
GET /indexl HTTP/1.0GET specifies the operation being requested. The
operation requested here is for the server to return a representation
of a resource. /indexl is a relative URL that
identifies the resource requested from the server. This resource is
assumed to reside on the machine that receives the request, so there
is no need to prefix it with
http://www.thismachine.com/.HTTP/1.0
is the version of the protocol that the client understands. The
request is terminated with two carriage return/linefeed pairs
(\r\n\r\n in Java parlance), regardless of how
lines are terminated on the client or server platform.Although the GET line is all that is required, a
client request can include other information as well. This takes the
following form:
Keyword: ValueThe most common such keyword is Accept, which
tells the server what kinds of data the client can handle (though
servers often ignore this). For example, the following line says that
the client can handle four MIME media types, corresponding to HTML
documents, plain text, and JPEG and GIF images:
Accept: text/html, text/plain, image/gif, image/jpegUser-Agent is another common keyword that lets the
server know what browser is being used, allowing the server to send
files optimized for the particular browser type. The line below says
that the request comes from Version 2.4 of the Lynx browser:
User-Agent: Lynx/2.4 libwww/2.1.4All but the oldest first-generation browsers also include a
Host field specifying the
server's name, which allows web servers to
distinguish between different named hosts served from the same IP
address. Here's an example:
Host: www.cafeaulait.orgFinally, the request is terminated with a blank linethat is,
two carriage return/linefeed pairs, \r\n\r\n. A
complete request might look like this:
GET /indexl HTTP/1.0In addition to GET, there are several other
Accept: text/html, text/plain, image/gif, image/jpeg
User-Agent: Lynx/2.4 libwww/2.1.4
Host: www.cafeaulait.org
request types. HEAD retrieves only the header for
the file, not the actual data. This is commonly used to check the
modification date of a file, to see whether a copy stored in the
local cache is still valid. POST sends form data
to the server, PUT uploads a resource to the
server, and DELETE removes a resource from the
server.
The server sends a response to the client. The response begins with a
response code, followed by a header full of metadata, a blank line,
and the requested document or an error message. Assuming the
requested document is found, a typical response looks like this:
HTTP/1.1 200 OKThe first line indicates the protocol the server is using
Date: Mon, 15 Sep 2003 21:06:50 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Last-Modified: Tue, 15 Apr 2003 17:28:57 GMT
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Content-length: 107
<html>
<head>
<title>
A Sample HTML file
</title>
</head>
<body>
The rest of the document goes here
</body>
</html>
(HTTP/1.1), followed by a response code.
200 OK is the most common
response code, indicating that the request was successful. Table 3-1
is a complete list of the response codes used by HTTP 1.0; HTTP 1.1
adds many more to this list. The other header lines identify the date
the request was made in the server's time frame, the
server software (Apache 2.0.40), the date this document was last
modified, a promise that the server will close the connection when
it's finished sending, the MIME content type, and
the length of the document delivered (not counting this
header)in this case, 107 bytes.
Either the client or the server or both close the connection. Thus, a
separate network connection is used for each request. If the client
reconnects, the server retains no memory of the previous connection
or its results. A protocol that retains no memory of past requests is
called stateless; in contrast, a
stateful protocol such as FTP can process many
requests before the connection is closed. The lack of state is both a
strength and a weakness of HTTP.
Response code | Meaning |
|---|---|
2xx Successful | Response codes between 200 and 299 indicate that the request was received, understood, and accepted. |
200 OK | This is the most common response code. If the request used GET or POST, the requested data is contained in the response along with the usual headers. If the request used HEAD, only the header information is included. |
201 Created | The server has created a data file at a URL specified in the body of the response. The web browser should now attempt to load that URL. This is sent only in response to POST requests. |
202 Accepted | This rather uncommon response indicates that a request (generally from POST) is being processed, but the processing is not yet complete so no response can be returned. The server should return an HTML page that explains the situation to the user, provides an estimate of when the request is likely to be completed, and, ideally, has a link to a status monitor of some kind. |
204 No Content | The server has successfully processed the request but has no information to send back to the client. This is usually the result of a poorly written form-processing program that accepts data but does not return a response to the user indicating that it has finished. |
3xx Redirection | Response codes from 300 to 399 indicate that the web browser needs to go to a different page. |
300 Multiple Choices | The page requested is available from one or more locations. The body of the response includes a list of locations from which the user or web browser can pick the most appropriate one. If the server prefers one of these locations, the URL of this choice is included in a Location header, which web browsers can use to load the preferred page. |
301 Moved Permanently | The page has moved to a new URL. The web browser should automatically load the page at this URL and update any bookmarks that point to the old URL. |
302 Moved Temporarily | This unusual response code indicates that a page is temporarily at a new URL but that the document's location will change again in the foreseeable future, so bookmarks should not be updated. |
304 Not Modified | The client has performed a GET request but used the If-Modified-Since header to indicate that it wants the document only if it has been recently updated. This status code is returned because the document has not been updated. The web browser will now load the page from a cache. |
4xx Client Error | Response codes from 400 to 499 indicate that the client has erred in some fashion, although the error may as easily be the result of an unreliable network connection as of a buggy or nonconforming web browser. The browser should stop sending data to the server as soon as it receives a 4xx response. Unless it is responding to a HEAD request, the server should explain the error status in the body of its response. |
400 Bad Request | The client request to the server used improper syntax. This is rather unusual, although it is likely to happen if you're writing and debugging a client. |
401 Unauthorized | Authorization, generally username and password controlled, is required to access this page. Either the username and password have not yet been presented or the username and password are invalid. |
403 Forbidden | The server understood the request but is deliberately refusing to process it. Authorization will not help. One reason this occurs is that the client asks for a directory listing but the server is not configured to provide it, as shown in Figure 3-1. |
404 Not Found | This most common error response indicates that the server cannot find the requested page. It may indicate a bad link, a page that has moved with no forwarding address, a mistyped URL, or something similar. |
5xx Server Error | Response codes from 500 to 599 indicate that something has gone wrong with the server, and the server cannot fix the problem. |
500 Internal Server Error | An unexpected condition occurred that the server does not know how to handle. |
501 Not Implemented | The server does not have the feature that is needed to fulfill this request. A server that cannot handle POST requests might send this response to a client that tried to POST form data to it. |
502 Bad Gateway | This response is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request. |
503 Service Unavailable | The server is temporarily unable to handle the request, perhaps as a result of overloading or maintenance. |
response code from 200 to 299 always indicates success, a response
code from 300 to 399 always indicates redirection, one from 400 to
499 always indicates a client error, and one from 500 to 599
indicates a server error.HTTP 1.0 is documented in the informational RFC 1945; it is not an
official Internet standard because it was primarily developed outside
the IETF by early browser and server vendors. HTTP 1.1 is a proposed
standard being developed by the W3C and the HTTP working group of the
IETF. It provides for much more flexible and powerful communication
between the client and the server. It's also a lot
more scalable. It's documented in RFC 2616. HTTP 1.0
is the basic version of the protocol. All current web servers and
browsers understand it. HTTP 1.1 adds numerous features to HTTP 1.0,
but doesn't change the underlying design or
architecture in any significant way. For the purposes of this book,
it will usually be sufficient to understand HTTP 1.0.The primary improvement in HTTP 1.1 is connection
reuse. HTTP 1.0 opens a new connection for every request.
In practice, the time taken to open and close all the connections in
a typical web session can outweigh the time taken to transmit the
data, especially for sessions with many small documents. HTTP 1.1
allows a browser to send many different requests over a single
connection; the connection remains open until it is explicitly
closed. The requests and responses are all asynchronous. A browser
doesn't need to wait for a response to its first
request before sending a second or a third. However, it remains tied
to the basic pattern of a client request followed by a server
response. Each request and response has the same basic form: a header
line, an HTTP header containing metadata, a blank line, and then the
data itself.There are a lot of other, smaller improvements in HTTP 1.1. Requests
include a Host header field so that one web server
can easily serve different sites at different URLs. Servers and
browsers can exchange compressed files and particular byte ranges of
a document, both of which decrease network traffic. And HTTP 1.1 is
designed to work much better with proxy servers. HTTP 1.1 is a
superset of HTTP 1.0, so HTTP 1.1 web servers have no trouble
interacting with older browsers that only speak HTTP 1.0, and vice
versa.
• Table of Contents• Index• Reviews• Reader Reviews• Errata• AcademicJava Network Programming, 3rd EditionBy
Elliotte Rusty Harold Publisher: O'ReillyPub Date: October 2004ISBN: 0-596-00721-3Pages: 706
Thoroughly revised to cover all the 100+ significant updates
to Java Developers Kit (JDK) 1.5, Java Network
Programming is a complete introduction to
developing network programs (both applets and applications)
using Java, covering everything from networking fundamentals
to remote method invocation (RMI). It includes chapters on
TCP and UDP sockets, multicasting protocol and content
handlers, servlets, and the new I/O API. This is the
essential resource for any serious Java developer.