15.10 Guessing MIME Content Types
If this were the best of all possible
worlds, every protocol and every server would use MIME types to
specify the kind of file being transferred. Unfortunately,
that's not the case. Not only do we have to deal
with older protocols such as FTP that predate MIME, but many HTTP
servers that should use MIME don't provide MIME
headers at all or lie and provide headers that are incorrect (usually
because the server has been misconfigured). The
URLConnection class provides two static methods to
help programs figure out the MIME type of some data; you can use
these if the content type just isn't available or if
you have reason to believe that the content type
you're given isn't correct. The
first of these is URLConnection.guessContentTypeFromName():
public static String guessContentTypeFromName(String name)[1][1] This method is protected in Java 1.3 and earlier, public in Java 1.4 and later.
This method tries to guess the content type of an object based upon
the extension in the filename portion of the
object's URL. It returns its best guess about the
content type as a String. This guess is likely to
be correct; people follow some fairly regular conventions when
thinking up filenames.The guesses are determined by the
content-types.properties file, normally located in the
jre/lib directory. On Unix, Java may also look
at the mailcap file to help it guess. Table 15-1 shows the guesses the JDK 1.5 makes. These
vary a little from one version of the JDK to the next.
Extension | MIME content type |
---|---|
No extension, or unrecognized extension | content/unknown |
.saveme, .dump, .hqx, .arc, .o, .a, .z, .bin, .exe, .zip, .gz | application/octet-stream |
.oda | application/oda |
application/pdf | |
.eps, .ai, .ps | application/postscript |
.dvi | application/x-dvi |
.hdf | application/x-hdf |
.latex | application/x-latex |
.nc, .cdf | application/x-netcdf |
.tex | application/x-tex: |
.texinfo, .texi | application/x-texinfo |
.t, .tr, .roff | application/x-troff |
.man | application/x-troff-man |
.me | application/x-troff-me |
.ms | application/x-troff-ms |
.src, .wsrc | application/x-wais-source |
.zip | application/zip |
.bcpio | application/x-bcpio |
.cpio | application/x-cpio |
.gtar | application/x-gtar |
.sh, .shar | application/x-shar |
.sv4cpio | application/x-sv4cpio: |
.sv4crc | application/x-sv4crc |
.tar | application/x-tar |
.ustar | application/x-ustar |
.snd, .au | audio/basic |
.aifc, .aif, .aiff | audio/x-aiff |
.wav | audio/x-wav |
.gif | image/gif |
.ief | image/ief |
.jfif, .jfif-tbnl, .jpe, .jpg, .jpeg | image/jpeg |
.tif, .tiff | image/tiff |
.fpx, .fpix | image/vnd.fpx |
.ras | image/x-cmu-rast |
.pnm | image/x-portable-anymap |
.pbm | image/x-portable-bitmap |
.pgm | image/x-portable-graymap |
.ppm | image/x-portable-pixmap |
.rgb | image/x-rgb |
.xbm, .xpm | image/x-xbitmap |
.xwd | image/x-xwindowdump |
.png | image/png |
, l | text/html |
.text, .c, .cc, .c++, .h, .pl, .txt, .java, .el | text/plain |
.tsv | text/tab-separated-values |
.etx | text/x-setext |
.mpg, .mpe, .mpeg | video/mpeg |
.mov, .qt | video/quicktime |
.avi | application/x-troff-msvideo |
.movie, .mv | video/x-sgi-movie |
.mime | message/rfc822 |
.xml | application/xml |
various XML applications such as RDF (.rdf), XSL
(.xsl), and so on that should have the MIME type
application/xml. It also doesn't
provide a MIME type for CSS stylesheets (.css).
However, it's a good start.The second MIME type guesser method is
URLConnection.guessContentTypeFromStream():
public static String guessContentTypeFromStream(InputStream in)This method tries to guess the content type by looking at the first
few bytes of data in the stream. For this method to work, the
InputStream must support marking so that you can
return to the beginning of the stream after the first bytes have been
read. Java 1.5 inspects the first 11 bytes of the
InputStream, although sometimes fewer bytes are
needed to make an identification. Table 15-2 shows how Java 1.5
guesses. Note that these guesses are often not as reliable as the
guesses made by the previous method. For example, an XML document
that begins with a comment rather than an XML declaration would be
mislabeled as an HTML file. This method should be used only as a last
resort.
First bytes in hexadecimal | First bytes in ASCII | MIME content type |
---|---|---|
0xACED | application/x-java-serialized-object | |
0xCAFEBABE | application/java-vm | |
0x47494638 | GIF8 | image/gif |
0x23646566 | #def | image/x-bitmap |
0x2158504D32 | !XPM2 | image/x-pixmap |
0x89504E 470D0A1A0A | image/png | |
0x2E736E64 | audio/basic | |
0x646E732E | audio/basic | |
0x3C3F786D6C | <?xml | application/xml |
0xFEFF003C003F00F7 | application/xml | |
0xFFFE3C003F00F700 | application/xml | |
0x3C21 | <! | text/html |
0x3C68746D6C | <html | text/html |
0x3C626F6479 | <body | text/html |
0x3C68656164 | <head | text/html |
0x3C48544D4C | <HTML | text/html |
0x3C424F4459 | <BODY | text/html |
0x3C48454144 | <HEAD | text/html |
0xFFD8FFE0 | image/jpeg | |
0xFFD8FFEE | image/jpeg | |
0xFFD8FFE1XXXX4578696600[2] | image/jpeg | |
0x89504E470D0A1A0A | image/png | |
0x52494646 | RIFF | audio/x-wav |
0xD0CF11E0A1B11AE1[3] | image/vnd.fpx |
checked. They can be anything. [3] This actually just checks for a
Microsoft structured storage document. Several other more complicated
checks have to be made before deciding whether this is indeed an
image/vnd.fpx document.
ASCII mappings, where they exist, are case-sensitive. For example,
guessContentTypeFromStream( ) does not recognize
<Html> as the beginning of a
text/html file.