7.3 The URI Class
A URI is an abstraction of a URL that
includes not only Uniform Resource Locators but also Uniform Resource
Names (URNs). Most URIs used in practice are URLs, but most
specifications and standards such as XML are defined in terms of
URIs. In Java 1.4 and later, URIs are represented by the
java.net.URI class. This class differs from the
java.net.URL class in three important ways:The URI class is purely about identification of
resources and parsing of URIs. It provides no methods to retrieve a
representation of the resource identified by its URI.The URI class is more conformant to the relevant
specifications than the URL class.A URI object can represent a relative URI. The
URL class absolutizes all URIs before storing
them.
In brief, a URL object is a representation of an
application layer protocol for network retrieval, whereas a
URI object is purely for string parsing and
manipulation. The URI class has no network
retrieval capabilities. The URL class has some
string parsing methods, such as getFile( ) and
getRef( ), but many of these are broken and
don't always behave exactly as the relevant
specifications say they should. Assuming you're
using Java 1.4 or later and therefore have a choice, you should use
the URL class when you want to download the
content of a URL and the URI class when you want
to use the URI for identification rather than retrieval, for
instance, to represent an XML namespace URI. In some cases when you
need to do both, you may convert from a URI to a
URL with the toURL( ) method,
and in Java 1.5 you can also convert from a URL to
a URI using the toURI( ) method
of the URL class.
7.3.1 Constructing a URI
URIs are built from strings. Unlike the
URL class, the URI class does
not depend on an underlying protocol handler. As long as the URI is
syntactically correct, Java does not need to understand its protocol
in order to create a representative URI object. Thus, unlike the
URL class, the URI class can be
used for new and experimental URI schemes.
7.3.1.1 public URI(String uri) throws URISyntaxException
This is the basic constructor that creates a new
URI object from any convenient string. For
example,
URI voice = new URI(":+1-800-9988-9938");
URI web = new URI("://www.xml.com/pub/a/2003/09/17/staxl#id=_hbc");
URI book = new URI("urn:isbn:1-565-92870-9");If the string argument does not follow URI syntax rulesforexample, if the URI begins with a colonthis constructor throws
a URISyntaxException. This is a checked exception,
so you need to either catch it or declare that the method where the
constructor is invoked can throw it. However, one syntactic rule is
not checked. In contradiction to the URI specification, the
characters used in the URI are not limited to ASCII. They can include
other Unicode characters, such as ø and é.
Syntactically, there are very few restrictions on URIs, especially
once the need to encode non-ASCII characters is removed and relative
URIs are allowed. Almost any string can be interpreted as a URI.
7.3.1.2 public URI(String scheme, String schemeSpecificPart, String fragment) throws URISyntaxException
This constructor is mostly used for nonhierarchical URIs. The scheme
is the URI's protocol, such as , urn, , and
so forth. It must be composed exclusively of ASCII letters and digits
and the three punctuation characters +,
-, and .. It must begin with a
letter. Passing null for this argument omits the scheme, thus
creating a relative URI. For example:
URI absolute = new URI(", "//www.ibiblio.org" , null);
URI relative = new URI(null, "/javafaq/index.shtml", "today");The scheme-specific part depends on the syntax of the URI scheme;it's one thing for an URL, another for a mailto
URL, and something else again for a URI. Because the
URI class encodes illegal characters with percent
escapes, there's effectively no syntax error you can
make in this part.Finally, the third argument contains the fragment identifier, if any.
Again, characters that are forbidden in a fragment identifier are
escaped automatically. Passing null for this argument simply omits
the fragment identifier.
7.3.1.3 public URI(String scheme, String host, String path, String fragment) throws URISyntaxException
This constructor is used for hierarchical URIs such as and ftp
URLs. The host and path together (separated by a /) form the
scheme-specific part for this URI. For example:
URI today= new URI(", "www.ibiblio.org", "/javafaq/indexl", "today");produces the URI ://www.ibiblio.org/javafaq/indexl#today.If the constructor cannot form a legal hierarchical URI from thesupplied piecesfor instance, if there is a scheme so the URI
has to be absolute but the path doesn't start with
/then it throws a URISyntaxException.
7.3.1.4 public URI(String scheme, String authority, String path, String query, String fragment) throws URISyntaxException
This constructor is basically the same as the previous one, with the
addition of a query string component. For example:
URI today= new URI(", "www.ibiblio.org", "/javafaq/indexl",
"referrer=cnet&date=2004-08-23", "today");As usual, any unescapable syntax errors cause aURISyntaxException to be thrown and null can be
passed to omit any of the arguments.
7.3.1.5 public URI(String scheme, String userInfo, String host, int port, String path, String query, String fragment) throws URISyntaxException
This is the master hierarchical URI constructor that the previous two
invoke. It divides the authority into separate user info, host, and
port parts, each of which has its own syntax rules. For example:
URI styles = new URI("ftp", "anonymous:elharo@metalab.unc.edu",
"ftp.oreilly.com", 21, "/pub/stylesheet", null, null);However, the resulting URI still has to follow all the usual rulesfor URIs and again, null can be passed for any argument to omit it
from the result.
7.3.1.6 public static URI create(String uri)
This is not a constructor, but rather a static factory method. Unlike
the constructors, it does not throw a
URISyntaxException. If you're
sure your URIs are legal and do not violate any of the rules, you can
use this method. For example, this invocation creates a
URI for anonymous FTP access using an email
address as password:
URI styles = URI.create(If the URI does prove to be malformed, this method throws an
"ftp://anonymous:elharo%40metalab.unc.edu@ftp.oreilly.com:
21/pub/stylesheet");
IllegalArgumentException. This is a runtime
exception, so you don't have to explicitly declare
it or catch it.
7.3.2 The Parts of the URI
A URI
reference has up to three parts: a scheme, a scheme-specific part,
and a fragment identifier. The general format is:
scheme:scheme-specific-part:fragmentIf
the scheme is omitted, the URI reference is relative. If the fragment
identifier is omitted, the URI reference is a pure URI. The URI class
has getter methods that return these three parts of each
URI object. The
getRawFoo(
) methods return the encoded forms of the parts of the URI,
while the equivalent
getFoo() methods first decode any percent-escaped characters and
then return the decoded part:
public String getScheme( )
public String getSchemeSpecificPart( )
public String getRawSchemeSpecificPart( )
public String getFragment( )
public String getRawFragment( )
|
URI object does not have the relevant component:
for example, a relative URI without a scheme or an URI without a
fragment identifier.A URI that has a scheme is an
absolute URI. A URI without a scheme is
relative. The isAbsolute() method returns true if the
URI is absolute, false if it's relative:
public boolean isAbsolute( )The details of the scheme-specific part vary depending on the type of
the scheme. For example, in a hierarchical format divided into an
authority, a path, and a query string. The authority is further
divided into user info, host, and port. The isOpaque() method returns false if the URI is
hierarchical, true if it's not
hierarchicalthat is, if it's opaque:
public boolean isOpaque( )If the URI is opaque, all you can get is the scheme, scheme-specific
part, and fragment identifier. However, if the URI is hierarchical,
there are getter methods for all the different parts of a
hierarchical URI:
public String getAuthority( )These methods all return the decoded parts; in other words, percent
public String getFragment( )
public String getHost( )
public String getPath( )
public String getPort( )
public String getQuery( )
public String getUserInfo( )
escapes, such as %3C, are changed into the characters they represent,
such as <. If you want the raw, encoded parts of the URI, there
are five parallel
getRawFoo() methods:
public String getRawAuthority( )Remember the URI class differs from the URI specification
public String getRawFragment( )
public String getRawPath( )
public String getRawQuery( )
public String getRawUserInfo( )
in that non-ASCII characters such as é and ü
are never percent-escaped in the first place, and thus will still be
present in the strings returned by the
getRawFoo() methods unless the strings originally used to construct
the URI object were encoded.
|
informationfor instance, the URI
://www.example.com has no user info, path,
port, or query stringthe relevant methods return null.
getPort( ) is the
single exception. Since it's declared to return an
int, it can't return
null. Instead, it returns -1 to indicate an
omitted port.For various technical reasons that don't have a lot
of practical impact, Java can't always initially
detect syntax errors in the authority component. The immediate
symptom of this failing is normally an inability to return the
individual parts of the authority: port, host, and user info. In this
event, you can call parseServerAuthority() to force the authority to
be reparsed:
public URI parseServerAuthority( ) throws URISyntaxExceptionThe original URI does not change
(URI objects are immutable), but the
URI returned will have separate authority parts
for user info, host, and port. If the authority cannot be parsed, a
URISyntaxException is thrown.Example 7-10 uses these methods to split URIs entered
on the command line into their component parts. It's
similar to Example 7-4 but works with any
syntactically correct URI, not just the ones Java has a protocol
handler for.
Example 7-10. The parts of a URI
import java.net.*;Here's the result of running this against three of
public class URISplitter {
public static void main(String args[]) {
for (int i = 0; i < args.length; i++) {
try {
URI u = new URI(args[i]);
System.out.println("The URI is " + u);
if (u.isOpaque( )) {
System.out.println("This is an opaque URI.");
System.out.println("The scheme is " + u.getScheme( ));
System.out.println("The scheme specific part is "
+ u.getSchemeSpecificPart( ));
System.out.println("The fragment ID is " + u.getFragment( ));
}
else {
System.out.println("This is a hierarchical URI.");
System.out.println("The scheme is " + u.getScheme( ));
try {
u = u.parseServerAuthority( );
System.out.println("The host is " + u.getUserInfo( ));
System.out.println("The user info is " + u.getUserInfo( ));
System.out.println("The port is " + u.getPort( ));
}
catch (URISyntaxException ex) {
// Must be a registry based authority
System.out.println("The authority is " + u.getAuthority( ));
}
System.out.println("The path is " + u.getPath( ));
System.out.println("The query string is " + u.getQuery( ));
System.out.println("The fragment ID is " + u.getFragment( ));
} // end else
} // end try
catch (URISyntaxException ex) {
System.err.println(args[i] + " does not seem to be a URI.");
}
System.out.println( );
} // end for
} // end main
} // end URISplitter
the URI examples in this section:
% java URISplitter :+1-800-9988-9938 \
://www.xml.com/pub/a/2003/09/17/staxl#id=_hbc \
urn:isbn:1-565-92870-9
The URI is :+1-800-9988-9938
This is an opaque URI.
The scheme is
The scheme specific part is +1-800-9988-9938
The fragment ID is null
The URI is ://www.xml.com/pub/a/2003/09/17/staxl#id=_hbc
This is a hierarchical URI.
The scheme is
The host is null
The user info is null
The port is -1
The path is /pub/a/2003/09/17/staxl
The query string is null
The fragment ID is id=_hbc
The URI is urn:isbn:1-565-92870-9
This is an opaque URI.
The scheme is urn
The scheme specific part is isbn:1-565-92870-9
The fragment ID is null
7.3.3 Resolving Relative URIs
The URI class has three methods for converting
back and forth between relative and absolute URIs.
7.3.3.1 public URI resolve(URI uri)
This method compares the
uri argument to this URI and
uses it to construct a new URI object that wraps
an absolute URI. For example, consider these three lines of code:
URI absolute = new URI("://www.example.com/");
URI relative = new URI("images/logo.png");
URI resolved = absolute.resolve(relative);After they've executed, resolvedcontains the absolute URI
://www.example.com/images/logo.png.If the invoking URI does not contain an absolute
URI itself, the resolve( ) method resolves as much
of the URI as it can and returns a new relative URI object as a
result. For example, take these three statements:
URI top = new URI("javafaq/books/");
URI relative = new URI("jnp3/examples/07/indexl");
URI resolved = top.resolve(relative);After they've executed, resolvednow contains the relative URI
javafaq/books/jnp3/examples/07/indexl with
no scheme or authority.
7.3.3.2 public URI resolve(String uri)
This is a convenience method that
simply converts the string argument to a URI and then resolves it
against the invoking URI, returning a new URI object as the result.
That is, it's equivalent to
resolve(new URI(str)). Using
this method, the previous two samples can be rewritten as:
URI absolute = new URI("://www.example.com/");
URI resolved = absolute.resolve("images/logo.png");
URI top = new URI("javafaq/books/");
resolved = top.resolve("jnp3/examples/07/indexl");7.3.3.3 public URI relativize(URI uri)
It's also possible
to reverse this procedure; that is, to go from an absolute URI to a
relative one. The relativize( ) method creates a
new URI object from the uri
argument that is relative to the invoking URI. The
argument is not changed. For example:
URI absolute = new URI("://www.example.com/images/logo.png");
URI top = new URI("://www.example.com/");
URI relative = top.relativize(absolute);The URI object relative nowcontains the relative URI images/logo.png.
7.3.4 Utility Methods
The URI
class has the usual batch of utility methods: equals(), hashCode( ), toString(
), and compareTo( ).
7.3.4.1 public boolean equals(Object o)
URIs are
tested for equality pretty much as you'd expect.
It's not a direct string comparison. Equal URIs must
both either be hierarchical or opaque. The scheme and authority parts
are compared without considering case. That is,
and HTTP are the same
scheme, and www.example.com is the same
authority as www.EXAMPLE.com. The rest of the
URI is case-sensitive, except for hexadecimal digits used to escape
illegal characters. Escapes are not decoded
before comparing. ://www.example.com/A and
://www.example.com/%41 are unequal URIs.
7.3.4.2 public int hashCode( )
The hashCode( ) method
is a usual hashCode( ) method, nothing special.
Equal URIs do have the same hash code and unequal URIs are fairly
unlikely to share the same hash code.
7.3.4.3 public int compareTo(Object o)
URIs can
be ordered. The ordering is based on string comparison of the
individual parts, in this sequence:If the schemes are different, the schemes are compared, without
considering case.Otherwise, if the schemes are the same, a hierarchical URI is
considered to be less than an opaque URI with the same scheme.If both URIs are opaque URIs, they're ordered
according to their scheme-specific parts.If both the scheme and the opaque scheme-specific parts are equal,
the URIs are compared by their fragments.If both URIs are hierarchical, they're ordered
according to their authority components, which are themselves ordered
according to user info, host, and port, in that order.If the schemes and the authorities are equal, the path is used to
distinguish them.If the paths are also equal, the query strings are compared.If the query strings are equal, the fragments are compared.
URIs are not comparable to any type except themselves. Comparing a
URI to anything except another
URI causes a
ClassCastException.
7.3.4.4 public String toString( )
The toString( ) method
returns an unencoded string form of the
URI. That is, characters like é and are not percent-escaped unless they were percent-escaped in the
strings used to construct this URI. Therefore, the
result of calling this method is not guaranteed to be a syntactically
correct URI. This form is sometimes useful for display to human
beings, but not for retrieval.
7.3.4.5 public String toASCIIString( )
The toASCIIString( ) method returns an
encoded string form of the
URI. Characters like é and \ are always
percent-escaped whether or not they were originally escaped. This is
the string form of the URI you should use most of the time. Even if
the form returned by toString( ) is more legible
for humans, they may still copy and paste it into areas that are not
expecting an illegal URI. toASCIIString( ) always
returns a syntactically correct URI.
• Table of Contents• Index• Reviews• Reader Reviews• Errata• AcademicJava Network Programming, 3rd EditionBy
Elliotte Rusty Harold Publisher: O'ReillyPub Date: October 2004ISBN: 0-596-00721-3Pages: 706
Thoroughly revised to cover all the 100+ significant updates
to Java Developers Kit (JDK) 1.5, Java Network
Programming is a complete introduction to
developing network programs (both applets and applications)
using Java, covering everything from networking fundamentals
to remote method invocation (RMI). It includes chapters on
TCP and UDP sockets, multicasting protocol and content
handlers, servlets, and the new I/O API. This is the
essential resource for any serious Java developer.
