Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 14.4. Checking for a Web Page's Existence


Credit: James Thiele, Rogier Steehouder


Problem


You want to check whether an HTTP URL corresponds to an existing web
page.


Solution


Using httplib allows you to easily check for a
page's existence without actually downloading the
page itself, just its headers. Here's a module
implementing a function to perform this task:

""
httpExists.py
A quick and dirty way to check whether a web file is there.
Usage:
>>> import httpExists
>>> httpExists.httpExists('http://www.python.org/')
True
>>> httpExists.httpExists('http://www.python.org/PenguinOnTheTelly')
Status 404 Not Found : http://www.python.org/PenguinOnTheTelly
False
""
import httplib, urlparse
def httpExists(url):
host, path = urlparse.urlsplit(url)[1:3]
if ':' in host:
# port specified, try to use it
host, port = host.split(':', 1)
try:
port = int(port)
except ValueError:
print 'invalid port number %r' % (port,)
return False
else:
# no port specified, use default port
port = None
try:
connection = httplib.HTTPConnection(host, port=port)
connection.request("HEAD", path)
resp = connection.getresponse( )
if resp.status == 200: # normal 'found' status
found = True
elif resp.status == 302: # recurse on temporary redirect
found = httpExists(urlparse.urljoin(url,
resp.getheader('location', '')))
else: # everything else -> not found
print "Status %d %s : %s" % (resp.status, resp.reason, url)
found = False
except Exception, e:
print e._ _class_ _, e, url
found = False
return found
def _test( ):
import doctest, httpExists
return doctest.testmod(httpExists)
if _ _name_ _ == "_ _main_ _":
_test( )


Discussion


While this recipe is very simple and runs quite fast (thanks to the
ability to use the HTTP command HEAD to get just
the headers, not the body, of the page), it may be too simplistic for
your specific needs: the HTTP result codes you might need to deal
with may go beyond the simple 200 success code, and 302 temporary
redirect, to include permanent redirects, temporary inaccessibility,
permission problems, and so on.

In my case, I needed to check the correctness of a huge number of
mutual links among pages of a site generated by a complex web
application on an intranet, so I knew I had the privilege of relying
on a simple check for "200 or
bust." At any rate, you can use this simple recipe
as a starting point to which to add any refinements you determine you
actually need.


See Also


Documentation on the urlparse and
httplib standard library modules in the
Library Reference and Python in a
Nutshell
.


/ 394