Credit: Guido van Rossum, creator of Python
Network programming is one of my favorite Python applications. I wrote or started most of the network modules in the Python Standard Library, including the socket and select extension modules and most of the protocol client modules (such as ftplib). I also wrote a popular server framework module, SocketServer, and two web browsers in Python, the first predating Mosaic. Need I say more?
Python's roots lie in a distributed operating system, Amoeba, which I helped design and implement in the late 1980s. Python was originally intended to be the scripting language for Amoeba, since it turned out that the Unix shell, while ported to Amoeba, wasn't very useful for writing Amoeba system administration scripts. Of course, I designed Python to be platform independent from the start. Once Python was ported from Amoeba to Unix, I taught myself BSD socket programming by wrapping the socket primitives in a Python extension module and then experimenting with them using Python; this was one of the first extension modules.
This approach proved to be a great early testimony of Python's strengths. Writing socket code in C is tedious: the code necessary to do error checking on every call quickly overtakes the logic of the program. Quick: in which order should a server call accept, bind, connect, and listen? This is remarkably difficult to find out if all you have is a set of Unix manpages. In Python, you don't have to write separate error-handling code for each call, making the logic of the code stand out much clearer. You can also learn about sockets by experimenting in an interactive Python shell, where misconceptions about the proper order of calls and the argument values that each call requires are cleared up quickly through Python's immediate error messages.
Python has come a long way since those first days, and now few applications use the socket module directly; most use much higher-level modules such as urllib or smtplib, and third-party extensions such as the Twisted framework, whose popularity keeps growing. The examples in this chapter are a varied bunch: some construct and send complex email messages, while others dwell on lower-level issues such as tunneling. My favorite is Recipe 13.11, which implements PyHeartBeat: it's useful, it uses the socket module, and it's simple enough to be an educational example. I do note, with that mixture of pride and sadness that always accompanies a parent's observation of children growing up, that, since the Python Cookbook's first edition, even PyHeartBeat has acquired an alternative server implementation based on Twisted!
Nevertheless, my own baby, the socket module itself, is still the foundation of all network operations in Python. It's a plain transliteration of the socket APIsfirst introduced in BSD Unix and now widespread on all platformsinto the object-oriented paradigm. You create socket objects by calling the socket.socket factory function, then you call methods on these objects to perform typical low-level network operations. You don't have to worry about allocating and freeing memory for buffers and the likePython handles that for you automatically. You express IP addresses as (host, port) pairs, in which host is a string in either dotted-quad ('1.2.3.4') or domain-name ('') notation. As you can see, even low-level modules in Python aren't as low level as all that.
Despite the various conveniences, the socket module still exposes the actual underlying functionality of your operating system's network sockets. If you're at all familiar with sockets, you'll quickly get the hang of Python's socket module, using Python's own Library Reference. You'll then be able to play with sockets interactively in Python to become a socket expert, if that is what you want. The classic, highly recommended work on this subject is W. Richard Stevens, UNIX Network Programming, Volume 1: Networking APIs - Sockets and XTI, 2d ed. (Prentice-Hall). For many practical uses, however, higher-level modules will serve you better.
The Internet uses a sometimes
dazzling variety of protocols and formats, and the Python Standard
Library supports many of them. In the Python Standard Library, you
will find dozens of modules dedicated to supporting specific Internet
protocols (such as smtplib to support the SMTP
protocol to send mail and nntplib to support the
Network News Transfer Protocol (NNTP) to send and receive Network
News). In addition, you'll find about as many
modules that support specific Internet formats (such as
I cannot even come close to doing justice to the powerful array of
tools mentioned in this introduction, nor will you find all of these
modules and packages used in this chapter, nor in this book, nor in
most programming shops. You may never need to write any program that
deals with Network News, for example; if that is the case, you
don't need to study nntplib. But
it is still reassuring to know it's there (part of
the "batteries included" approach
of the Python Standard Library).
Two
higher-level modules that stand out from the crowd, however, are
urllib and urllib2. Each of
these two modules can deal with several protocols through the magic
of URLsthose now-familiar strings, such as :///indexl, that
identify a protocol (such as ), a host and port (such as , port 80 being the default for
the HTTP protocol), and a specific resource at that address (such as
/indexl). urllib is very
simple to use, but urllib2 is more powerful and
extensible. HTTP is the most popular protocol for URLs, but these
modules also support several others, such as FTP. In many cases,
you'll be able to use these modules to write typical
client-side scripts that interact with any of the supported protocols
much quicker and with less effort than it might take with the various
protocol-specific modules. To illustrate, I'd like to conclude with a cookbook
example of my own. It's similar to Recipe 13.2, but, rather than a program
fragment, it's a little script. I call it
wget.py because it does everything for which
I've ever needed wget. (In fact,
I originally wrote this script on a system where
wget wasn't installed but Python
was; writing wget.py was a more effective use of
my time than downloading and installing the real thing.) Pass this script one or more URLs as command-line arguments; the
script retrieves them into local files whose names match the last
components of the URLs. The script also prints progress information
of the form: Obviously, it's easy to improve on this script; but
it's only seven lines, it's
readable, and it worksand that's
what's so cool about Python. Another cool thing about Python is that you can incrementally improve
a program like this, and after it's grown by two or
three orders of magnitude, it's still readable, and
it still works! To see what this particular example might evolve
into, check out Tools/webchecker/websucker.py in
the Python source distribution. Enjoy!
import sys, urllib
def reporthook(*a): print a
for url in sys.argv[1:]:
i = url.rfind('/')
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
(block number, block size, total size)