Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Recipe 14.4. Checking for a Web Page's Existence

Credit: James Thiele, Rogier Steehouder

Problem

You want to check whether an HTTP URL corresponds to an existing web
page.

Solution

Using httplib allows you to easily check for a
page's existence without actually downloading the
page itself, just its headers. Here's a module
implementing a function to perform this task:

""
httpExists.py
A quick and dirty way to check whether a web file is there.
Usage:
>>> import httpExists
>>> httpExists.httpExists('http://www.python.org/')
True
>>> httpExists.httpExists('http://www.python.org/PenguinOnTheTelly')
Status 404 Not Found : http://www.python.org/PenguinOnTheTelly
False
""
import httplib, urlparse
def httpExists(url):
host, path = urlparse.urlsplit(url)[1:3]
if ':' in host:
# port specified, try to use it
host, port = host.split(':', 1)
try:
port = int(port)
except ValueError:
print 'invalid port number %r' % (port,)
return False
else:
# no port specified, use default port
port = None
try:
connection = httplib.HTTPConnection(host, port=port)
connection.request("HEAD", path)
resp = connection.getresponse( )
if resp.status == 200:       # normal 'found' status
found = True
elif resp.status == 302:     # recurse on temporary redirect
found = httpExists(urlparse.urljoin(url,
resp.getheader('location', '')))
else:                        # everything else -> not found
print "Status %d %s : %s" % (resp.status, resp.reason, url)
found = False
except Exception, e:
print e._ _class_ _, e, url
found = False
return found
def _test( ):
import doctest, httpExists
return doctest.testmod(httpExists)
if _ _name_ _ == "_ _main_ _":
_test( )

Discussion

While this recipe is very simple and runs quite fast (thanks to the
ability to use the HTTP command HEAD to get just
the headers, not the body, of the page), it may be too simplistic for
your specific needs: the HTTP result codes you might need to deal
with may go beyond the simple 200 success code, and 302 temporary
redirect, to include permanent redirects, temporary inaccessibility,
permission problems, and so on.

In my case, I needed to check the correctness of a huge number of
mutual links among pages of a site generated by a complex web
application on an intranet, so I knew I had the privilege of relying
on a simple check for "200 or
bust." At any rate, you can use this simple recipe
as a starting point to which to add any refinements you determine you
actually need.

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی