Recipe 14.5. Checking Content Type via HTTP
Credit: Bob Stockwell
Problem
You need to determine whether
a URL, or an open file, obtained from urllib.open
on a URL, is of a particular content type (such as
'text' for HTML or 'image' for
GIF).
Solution
The content type of any resource can easily be checked through the
pseudo-file that urllib.urlopen returns for the
resource. Here is a function to show how to perform such checks:
import urllib
def isContentType(URLorFile, contentType='text'):
"" Tells whether the URL (or pseudofile from urllib.urlopen) is of
the required content type (default 'text').
""
try:
if isinstance(URLorFile, str):
thefile = urllib.urlopen(URLorFile)
else:
thefile = URLorFile
result = thefile.info( ).getmaintype( ) == contentType.lower( )
if thefile is not URLorFile:
thefile.close( )
except IOError:
result = False # if we couldn't open it, it's of _no_ type!
return result
Discussion
For greater flexibility, this recipe accepts either the result of a
previous call to urllib.urlopen, or a URL in
string form. In the latter case, the Solution opens the URL with
urllib and, at the end, closes the resulting
pseudo-file again. If the attempt to open the URL fails, the recipe
catches the IOError and returns a result of
False, considering that a URL that cannot be
opened is of no type at all, and therefore in particular is not of
the type the caller was checking for. (Alternatively, you might
prefer to propagate the exception; if that's what
you want, remove the TRy and
except clause headers and the result =
False assignment that is the body of the
except clause.)Whether the pseudo-file was passed in or opened locally from a URL
string, the info method of the pseudo-file gives
as its result an instance of mimetools.Message
(which doesn't mean you need to import
mimetools yourselfurllib does all
that's needed). On that object, we can call any of
several methods to get the content type, depending on what exactly we
wantgettype to get both main and subtype
with a slash in between (as in 'text/plain'),
getmaintype to get the main type (as in
'text'), or getsubtype to get
the subtype (as in 'plain'). In this recipe, we
want the main content type.The string result from all of the type interrogation methods is
always lowercase, so we take the precaution of calling the
lower method on parameter
contentType as well, before comparing for equality.
See Also
Documentation on the urllib and
mimetools standard library modules in the
Library Reference and Python in a
Nutshell; a list of important content types is at
http://www.utoronto.ca/ian/books/html4ed/appb/mimetypel;
a helpful explanation of the significance of content types at
http://ppewww.ph.gla.ac.uk/~flavell/www/content-typel.