A URI is an abstraction of a URL that includes not only Uniform Resource Locators but also Uniform Resource Names (URNs). Most URIs used in practice are URLs, but most specifications and standards such as XML are defined in terms of URIs. In Java 1.4 and later, URIs are represented by the java.net.URI class. This class differs from the java.net.URL class in three important ways:
The URI class is purely about identification of resources and parsing of URIs. It provides no methods to retrieve a representation of the resource identified by its URI.
The URI class is more conformant to the relevant specifications than the URL class.
A URI object can represent a relative URI. The URL class absolutizes all URIs before storing them.
In brief, a URL object is a representation of an application layer protocol for network retrieval, whereas a URI object is purely for string parsing and manipulation. The URI class has no network retrieval capabilities. The URL class has some string parsing methods, such as getFile( ) and getRef( ), but many of these are broken and don't always behave exactly as the relevant specifications say they should. Assuming you're using Java 1.4 or later and therefore have a choice, you should use the URL class when you want to download the content of a URL and the URI class when you want to use the URI for identification rather than retrieval, for instance, to represent an XML namespace URI. In some cases when you need to do both, you may convert from a URI to a URL with the toURL( ) method, and in Java 1.5 you can also convert from a URL to a URI using the toURI( ) method of the URL class.
URIs are built from strings. Unlike the URL class, the URI class does not depend on an underlying protocol handler. As long as the URI is syntactically correct, Java does not need to understand its protocol in order to create a representative URI object. Thus, unlike the URL class, the URI class can be used for new and experimental URI schemes.
This is the basic constructor that creates a new URI object from any convenient string. For example,
URI voice = new URI(":+1-800-9988-9938");
URI web = new URI("://www.xml.com/pub/a/2003/09/17/staxl#id=_hbc");
URI book = new URI("urn:isbn:1-565-92870-9");If the string argument does not follow URI syntax rulesfor example, if the URI begins with a colonthis constructor throws a URISyntaxException. This is a checked exception, so you need to either catch it or declare that the method where the constructor is invoked can throw it. However, one syntactic rule is not checked. In contradiction to the URI specification, the characters used in the URI are not limited to ASCII. They can include other Unicode characters, such as ø and é. Syntactically, there are very few restrictions on URIs, especially once the need to encode non-ASCII characters is removed and relative URIs are allowed. Almost any string can be interpreted as a URI.
This constructor is mostly used for nonhierarchical URIs. The scheme is the URI's protocol, such as , urn, , and so forth. It must be composed exclusively of ASCII letters and digits and the three punctuation characters +, -, and .. It must begin with a letter. Passing null for this argument omits the scheme, thus creating a relative URI. For example:
URI absolute = new URI(", "//www.ibiblio.org" , null);
URI relative = new URI(null, "/javafaq/index.l", "today");The scheme-specific part depends on the syntax of the URI scheme; it's one thing for an URL, another for a mailto URL, and something else again for a URI. Because the URI class encodes illegal characters with percent escapes, there's effectively no syntax error you can make in this part.
Finally, the third argument contains the fragment identifier, if any. Again, characters that are forbidden in a fragment identifier are escaped automatically. Passing null for this argument simply omits the fragment identifier.
This constructor is used for hierarchical URIs such as and ftp URLs. The host and path together (separated by a /) form the scheme-specific part for this URI. For example:
URI today= new URI(", "www.ibiblio.org", "/javafaq/indexl", "today");produces the URI ://www.ibiblio.org/javafaq/indexl#today.
If the constructor cannot form a legal hierarchical URI from the supplied piecesfor instance, if there is a scheme so the URI has to be absolute but the path doesn't start with /then it throws a URISyntaxException.
This constructor is basically the same as the previous one, with the addition of a query string component. For example:
URI today= new URI(", "www.ibiblio.org", "/javafaq/indexl",
"referrer=cnet&date=2004-08-23", "today");As usual, any unescapable syntax errors cause a URISyntaxException to be thrown and null can be passed to omit any of the arguments.
This is the master hierarchical URI constructor that the previous two invoke. It divides the authority into separate user info, host, and port parts, each of which has its own syntax rules. For example:
URI styles = new URI("ftp", "anonymous:elharo@metalab.unc.edu",
"ftp.oreilly.com", 21, "/pub/stylesheet", null, null);However, the resulting URI still has to follow all the usual rules for URIs and again, null can be passed for any argument to omit it from the result.
This is not a constructor, but rather a static factory method. Unlike the constructors, it does not throw a URISyntaxException. If you're sure your URIs are legal and do not violate any of the rules, you can use this method. For example, this invocation creates a URI for anonymous FTP access using an email address as password:
URI styles = URI.create( "ftp://anonymous:elharo%40metalab.unc.edu@ftp.oreilly.com: 21/pub/stylesheet");
If the URI does prove to be malformed, this method throws an IllegalArgumentException. This is a runtime exception, so you don't have to explicitly declare it or catch it.
A URI reference has up to three parts: a scheme, a scheme-specific part, and a fragment identifier. The general format is:
scheme:scheme-specific-part:fragment
If the scheme is omitted, the URI reference is relative. If the fragment identifier is omitted, the URI reference is a pure URI. The URI class has getter methods that return these three parts of each URI object. The getRawFoo( ) methods return the encoded forms of the parts of the URI, while the equivalent getFoo() methods first decode any percent-escaped characters and then return the decoded part:
public String getScheme( ) public String getSchemeSpecificPart( ) public String getRawSchemeSpecificPart( ) public String getFragment( ) public String getRawFragment( )
|
These methods all return null if the particular URI object does not have the relevant component: for example, a relative URI without a scheme or an URI without a fragment identifier.
A URI that has a scheme is an absolute URI. A URI without a scheme is relative. The isAbsolute() method returns true if the URI is absolute, false if it's relative:
public boolean isAbsolute( )
The details of the scheme-specific part vary depending on the type of the scheme. For example, in a hierarchical format divided into an authority, a path, and a query string. The authority is further divided into user info, host, and port. The isOpaque() method returns false if the URI is hierarchical, true if it's not hierarchicalthat is, if it's opaque:
public boolean isOpaque( )
If the URI is opaque, all you can get is the scheme, scheme-specific part, and fragment identifier. However, if the URI is hierarchical, there are getter methods for all the different parts of a hierarchical URI:
public String getAuthority( ) public String getFragment( ) public String getHost( ) public String getPath( ) public String getPort( ) public String getQuery( ) public String getUserInfo( )
These methods all return the decoded parts; in other words, percent escapes, such as %3C, are changed into the characters they represent, such as <. If you want the raw, encoded parts of the URI, there are five parallel getRawFoo() methods:
public String getRawAuthority( ) public String getRawFragment( ) public String getRawPath( ) public String getRawQuery( ) public String getRawUserInfo( )
Remember the URI class differs from the URI specification in that non-ASCII characters such as é and ü are never percent-escaped in the first place, and thus will still be present in the strings returned by the getRawFoo() methods unless the strings originally used to construct the URI object were encoded.
|
In the event that the specific URI does not contain this informationfor instance, the URI ://www.example.com has no user info, path, port, or query stringthe relevant methods return null. getPort( ) is the single exception. Since it's declared to return an int, it can't return null. Instead, it returns -1 to indicate an omitted port.
For various technical reasons that don't have a lot of practical impact, Java can't always initially detect syntax errors in the authority component. The immediate symptom of this failing is normally an inability to return the individual parts of the authority: port, host, and user info. In this event, you can call parseServerAuthority() to force the authority to be reparsed:
public URI parseServerAuthority( ) throws URISyntaxException
The original URI does not change (URI objects are immutable), but the URI returned will have separate authority parts for user info, host, and port. If the authority cannot be parsed, a URISyntaxException is thrown.
Example 7-10 uses these methods to split URIs entered on the command line into their component parts. It's similar to Example 7-4 but works with any syntactically correct URI, not just the ones Java has a protocol handler for.
import java.net.*;
public class URISplitter {
public static void main(String args[]) {
for (int i = 0; i < args.length; i++) {
try {
URI u = new URI(args[i]);
System.out.println("The URI is " + u);
if (u.isOpaque( )) {
System.out.println("This is an opaque URI.");
System.out.println("The scheme is " + u.getScheme( ));
System.out.println("The scheme specific part is "
+ u.getSchemeSpecificPart( ));
System.out.println("The fragment ID is " + u.getFragment( ));
}
else {
System.out.println("This is a hierarchical URI.");
System.out.println("The scheme is " + u.getScheme( ));
try {
u = u.parseServerAuthority( );
System.out.println("The host is " + u.getUserInfo( ));
System.out.println("The user info is " + u.getUserInfo( ));
System.out.println("The port is " + u.getPort( ));
}
catch (URISyntaxException ex) {
// Must be a registry based authority
System.out.println("The authority is " + u.getAuthority( ));
}
System.out.println("The path is " + u.getPath( ));
System.out.println("The query string is " + u.getQuery( ));
System.out.println("The fragment ID is " + u.getFragment( ));
} // end else
} // end try
catch (URISyntaxException ex) {
System.err.println(args[i] + " does not seem to be a URI.");
}
System.out.println( );
} // end for
} // end main
} // end URISplitterHere's the result of running this against three of the URI examples in this section:
% java URISplitter :+1-800-9988-9938 \ ://www.xml.com/pub/a/2003/09/17/staxl#id=_hbc \ urn:isbn:1-565-92870-9 The URI is :+1-800-9988-9938 This is an opaque URI. The scheme is The scheme specific part is +1-800-9988-9938 The fragment ID is null The URI is ://www.xml.com/pub/a/2003/09/17/staxl#id=_hbc This is a hierarchical URI. The scheme is The host is null The user info is null The port is -1 The path is /pub/a/2003/09/17/staxl The query string is null The fragment ID is id=_hbc The URI is urn:isbn:1-565-92870-9 This is an opaque URI. The scheme is urn The scheme specific part is isbn:1-565-92870-9 The fragment ID is null
The URI class has three methods for converting back and forth between relative and absolute URIs.
This method compares the uri argument to this URI and uses it to construct a new URI object that wraps an absolute URI. For example, consider these three lines of code:
URI absolute = new URI("://www.example.com/");
URI relative = new URI("images/logo.png");
URI resolved = absolute.resolve(relative);After they've executed, resolved contains the absolute URI ://www.example.com/images/logo.png.
If the invoking URI does not contain an absolute URI itself, the resolve( ) method resolves as much of the URI as it can and returns a new relative URI object as a result. For example, take these three statements:
URI top = new URI("javafaq/books/");
URI relative = new URI("jnp3/examples/07/indexl");
URI resolved = top.resolve(relative);After they've executed, resolved now contains the relative URI javafaq/books/jnp3/examples/07/indexl with no scheme or authority.
This is a convenience method that simply converts the string argument to a URI and then resolves it against the invoking URI, returning a new URI object as the result. That is, it's equivalent to resolve(new URI(str)). Using this method, the previous two samples can be rewritten as:
URI absolute = new URI("://www.example.com/");
URI resolved = absolute.resolve("images/logo.png");
URI top = new URI("javafaq/books/");
resolved = top.resolve("jnp3/examples/07/indexl");
It's also possible to reverse this procedure; that is, to go from an absolute URI to a relative one. The relativize( ) method creates a new URI object from the uri argument that is relative to the invoking URI. The argument is not changed. For example:
URI absolute = new URI("://www.example.com/images/logo.png");
URI top = new URI("://www.example.com/");
URI relative = top.relativize(absolute);The URI object relative now contains the relative URI images/logo.png.
The URI class has the usual batch of utility methods: equals(), hashCode( ), toString( ), and compareTo( ).
URIs are tested for equality pretty much as you'd expect. It's not a direct string comparison. Equal URIs must both either be hierarchical or opaque. The scheme and authority parts are compared without considering case. That is, and HTTP are the same scheme, and www.example.com is the same authority as www.EXAMPLE.com. The rest of the URI is case-sensitive, except for hexadecimal digits used to escape illegal characters. Escapes are not decoded before comparing. ://www.example.com/A and ://www.example.com/%41 are unequal URIs.
The hashCode( ) method is a usual hashCode( ) method, nothing special. Equal URIs do have the same hash code and unequal URIs are fairly unlikely to share the same hash code.
URIs can be ordered. The ordering is based on string comparison of the individual parts, in this sequence:
If the schemes are different, the schemes are compared, without considering case.
Otherwise, if the schemes are the same, a hierarchical URI is considered to be less than an opaque URI with the same scheme.
If both URIs are opaque URIs, they're ordered according to their scheme-specific parts.
If both the scheme and the opaque scheme-specific parts are equal, the URIs are compared by their fragments.
If both URIs are hierarchical, they're ordered according to their authority components, which are themselves ordered according to user info, host, and port, in that order.
If the schemes and the authorities are equal, the path is used to distinguish them.
If the paths are also equal, the query strings are compared.
If the query strings are equal, the fragments are compared.
URIs are not comparable to any type except themselves. Comparing a URI to anything except another URI causes a ClassCastException.
The toString( ) method returns an unencoded string form of the URI. That is, characters like é and are not percent-escaped unless they were percent-escaped in the strings used to construct this URI. Therefore, the result of calling this method is not guaranteed to be a syntactically correct URI. This form is sometimes useful for display to human beings, but not for retrieval.
The toASCIIString( ) method returns an encoded string form of the URI. Characters like é and \ are always percent-escaped whether or not they were originally escaped. This is the string form of the URI you should use most of the time. Even if the form returned by toString( ) is more legible for humans, they may still copy and paste it into areas that are not expecting an illegal URI. toASCIIString( ) always returns a syntactically correct URI.