15.12 Caches
Web browsers
have been caching pages and images for years. If a logo is repeated
on every page of a site, the browser normally loads it from the
remote server only once, stores it in its cache, and reloads it from
the cache whenever it's needed rather than returning
to the remote server every time the same page is needed. Several
HTTP headers, including Expires and
Cache-Control, can control caching.Java 1.5 finally adds the ability to cache data to the
URL and URLConnection classes.
By default, Java 1.5 does not cache anything, but you can create your
own cache by subclassing the
java.net.ResponseCache class and installing it as the system
default. Whenever the system tries to load a new URL thorough a
protocol handler, it will first look for it in the cache. If the
cache returns the desired content, the protocol handler
won't need to connect to the remote server. However,
if the requested data is not in the cache, the protocol handler will
download it. After it's done so, it will put its
response into the cache so the content is more quickly available the
next time that URL is loaded.Two abstract methods in the ResponseCache class
store and retrieve data from the system's single
cache:
public abstract CacheResponse get(URI uri, String requestMethod,The put( ) method returns a
Map<String,List<String>> requestHeaders) throws IOException
public abstract CacheRequest put(URI uri, URLConnection connection)
throws IOException
CacheRequest object that wraps an
OutputStream into which the protocol handler will
write the data it reads. CacheRequest is an
abstract class with two methods, as shown in Example 15-11.
Example 15-11. The CacheRequest class
package java.netThe getOutputStream() method in the subclass should return an
public abstract class CacheRequest {
public abstract OutputStream getBody( ) throws IOException;
public abstract void abort( );
}
OutputStream that points into the
cache's data store for the URI passed to the
put( ) method at the same time. For instance, if
you're storing the data in a file, then
you'd return a FileOutputStream
connected to that file. The protocol handler will copy the data it
reads onto this OutputStream. If a problem arises
while copying (e.g., the server unexpectedly closes the connection),
the protocol handler calls the abort( ) method.
This method should then remove any data that has been stored from the
cache.Example 15-12 demonstrates a basic
CacheRequest subclass that passes back a
ByteArrayOutputStream. Later the data can be
retrieved using the getData( ) method, a custom
method in this subclass just retrieving the data Java wrote onto the
OutputStream this class supplied. An obvious
alternative strategy would be to store results in files and use a
FileOutputStream instead.
Example 15-12. A basic CacheRequest subclass
import java.net.*;The get( ) method retrieves the data and headers
import java.io.*;
import java.util.*;
public class SimpleCacheRequest extends CacheRequest {
ByteArrayOutputStream out = new ByteArrayOutputStream( );
public OutputStream getBody( ) throws IOException {
return out;
}
public void abort( ) {
out = null;
}
public byte[] getData( ) {
if (out == null) return null;
else return out.toByteArray( );
}
}
from the cache and returns them wrapped in a
CacheResponse object. It returns
null if the desired URI is not in the cache, in
which case the protocol handler loads the URI from the remote server
as normal. Again, this is an abstract class that you have to
implement in a subclass. Example 15-13 summarizes this
class. It has two methods, one to return the data of the request and
one to return the headers. When caching the original response, you
need to store both. The headers should be returned in an unmodifiable
map with keys that are the HTTP header field names and values that
are lists of values for each named HTTP header.
Example 15-13. The CacheResponse class
package java.net;Example 15-14 shows a simple
public abstract class CacheRequest {
public abstract InputStream getBody( ) ;
public abstract Map<String,List<String>> getHeaders( );
}
CacheResponse subclass that is tied to a
SimpleCacheRequest. In this example, shared
references pass data from the request class to the response class. If
we were storing responses in files, we'd just need
to share the filenames instead. Along with the
SimpleCacheRequest object from which it will read
the data, we must also pass the original
URLConnection object into the constructor. This is
used to read the HTTP header so it can be stored for later retrieval.
The object also keeps track of the expiration date (if any) provided
by the server for the cached representation of the resource.
Example 15-14. A basic CacheResponse subclass
import java.net.*;Finally, we need a simple ResponseCache subclass
import java.io.*;
import java.util.*;
public class SimpleCacheResponse extends CacheResponse {
private Map<String,List<String>> headers;
private SimpleCacheRequest request;
private Date expires;
public SimpleCacheResponse(SimpleCacheRequest request, URLConnection uc)
throws IOException {
this.request = request;
// deliberate shadowing; we need to fill the map and
// then make it unmodifiable
Map<String,List<String>> headers = new HashMap<String,List<String>>( );
String value = ";
for (int i = 0;; i++) {
String name = uc.getHeaderFieldKey(i);
value = uc.getHeaderField(i);
if (value == null) break;
List<String> values = headers.get(name);
if (values == null) {
values = new ArrayList<String>(1);
headers.put(name, values);
}
values.add(value);
}
long expiration = uc.getExpiration( );
if (expiration != 0) {
this.expires = new Date(expiration);
}
this.headers = Collections.unmodifiableMap(headers);
}
public InputStream getBody( ) {
return new ByteArrayInputStream(request.getData( ));
}
public Map<String,List<String>> getHeaders( )
throws IOException {
return headers;
}
public boolean isExpired( ) {
if (expires == null) return false;
else {
Date now = new Date( );
return expires.before(now);
}
}
}
that passes SimpleCacheRequests and
SimpleCacheResponses back to the protocol handler
as requested. Example 15-15 demonstrates such a simple
class that stores a finite number of responses in memory in one big
HashMap.
Example 15-15. An in-memory ResponseCache
import java.net.*;Once a ResponseCache like this one is installed,
import java.io.*;
import java.util.*;
import java.util.concurrent.*;
public class MemoryCache extends ResponseCache {
private Map<URI, SimpleCacheResponse> responses
= new ConcurrentHashMap<URI, SimpleCacheResponse>( );
private int maxEntries = 100;
public MemoryCache( ) {
this(100);
}
public MemoryCache(int maxEntries) {
this.maxEntries = maxEntries;
}
public CacheRequest put(URI uri, URLConnection uc)
throws IOException {
if (responses.size( ) >= maxEntries) return null;
String cacheControl = uc.getHeaderField("Cache-Control");
if (cacheControl != null && cacheControl.indexOf("no-cache") >= 0) {
return null;
}
SimpleCacheRequest request = new SimpleCacheRequest( );
SimpleCacheResponse response = new SimpleCacheResponse(request, uc);
responses.put(uri, response);
return request;
}
public CacheResponse get(URI uri, String requestMethod,
Map<String,List<String>> requestHeaders)
throws IOException {
SimpleCacheResponse response = responses.get(uri);
// check expiration date
if (response != null && response.isExpired( )) {
responses.remove(response);
response = null;
}
return response;
}
}
Java's HTTP protocol handler always uses it, even
when it shouldn't. The client code needs to check
the expiration dates on anything it's stored and
watch out for Cache-Control header fields. The key value of concern
is no-cache. If you see this string in a Cache-Control header field,
it means any resource representation is valid only momentarily and
any cached copy is likely to be out of date almost immediately, so
you really shouldn't store it at all.Each retrieved resource stays in the HashMap until
it expires. This example waits for an expired document to be
requested again before it deletes it from the cache. A more
sophisticated implementation could use a low-priority thread to scan
for expired documents and remove them to make way for others. Instead
of or in addition to this, an implementation might cache the
representations in a queue and remove the oldest documents or those
closest to their expiration date as necessary to make room for new
ones. An even more sophisticated implementation could track how often
each document in the store was accessed and expunge only the oldest
and least-used documents.I've already mentioned that you could implement this
on top of the filesystem instead of sitting on top of the Java
Collections API. You could also store the cache in a database and you
could do a lot of less-common things as well. For instance, you could
redirect requests for certain URLs to a local server rather than a
remote server halfway around the world, in essence using a local web
server as the cache. Or a ResponseCache could load
a fixed set of files at launch time and then only serve those out of
memory. This might be useful for a server that processes many
different SOAP requests, all of which adhere to a few common schemas
that can be stored in the cache. The abstract
ResponseCache class is flexible enough to support
all of these and other usage patterns.Regrettably, Java only allows one cache at a time. To change the
cache object, use the static ResponseCache.setDefault() and ResponseCache.getDefault( )
methods:
public static ResponseCache getDefault( )These set the single cache used by all programs running within the
public static void setDefault(ResponseCache responseCache)
same Java virtual machine. For example, this one line of code
installs Example 15-13 in an application:
ResponseCache.setDefault(new MemoryCache( ));