Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Recipe 13.2. Grabbing a Document from the Web

Credit: Gisle Aas, Magnus
Bodin

Problem

You need to grab
a document from a URL on the Web.

Solution

urllib.urlopen returns a file-like object, and you
can call the read method on that object to get all
of its contents:

from urllib import urlopen
doc = urlopen("http://www.python.org").read( )
print doc

Discussion

Once you obtain a file-like object from urlopen,
you can read it all at once into one big string by calling its
read method, as I do in this recipe.
Alternatively, you can read the object as a list of lines by calling
its readlines method, or, for special purposes,
just get one line at a time by looping over the object in a
for loop. In addition to these file-like
operations, the object that urlopen returns offers
a few other useful features. For example, the following snippet gives
you the headers of the document:

doc = urlopen("http://www.python.org")
print doc.info( )

such as the Content-Type header
(text/html in this case) that defines the MIME
type of the document. doc.info returns a
mimetools.Message instance, so you can access it
in various ways besides printing it or otherwise transforming it into
a string. For example, doc.info( ).getheader(`Content-Type')
returns the 'text/html' string. The
maintype attribute of the
mimetools.Message object is the
'text' string, subtype is the
'html' string, and type is also
the 'text/html' string. If you need to perform
sophisticated analysis and processing, all the tools you need are
right there. At the same time, if your needs are simpler, you can
meet them in very simple ways, as this recipe shows.

If what you need to do with the document you grab from the Web is
specifically to save it to a local file,
urllib.urlretrieve is just what you need, as the
"Introduction" to this chapter
describes.

urllib implicitly supports the use of proxies (as
long as the proxies do not require authentication: the current
implementation of urllib does not support
authentication-requiring proxies). Just set environment variable
HTTP_PROXY to a URL, such as
'http://proxy.domain.com:8080', to use the proxy
at that URL. If the environment variable
HTTP_PROXY is not set, urllib
may also look for the information in other platform-specific
locations, such as the Windows registry if you're
running under Windows.

If you have more advanced needs, such as using proxies that require
authentication, you may use the more sophisticated
urllib2 module of the Python Standard Library,
rather than simple module urllib. At http://pydoc.org/2.3/urllib2l, you can
find an example of how to use urllib2 for the
specific task of accessing the Internet through a proxy that does
require authentication.

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی