Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید






Introduction


Credit: Greg Wilson, Third Bit

Thirty years ago, in his classic
The Mythical Man-Month: Essays on Software
Engineering
(Addison-Wesley), Fred Brooks drew a
distinction between accidental and intrinsic complexity. Languages
such as English and C++, with their inconsistent rules, exceptions,
and special cases, are examples of the former: they make
communication and programming harder than they need to be.
Concurrency, on the other hand, is a prime example of the latter.
Most people have to struggle to keep one chain of events straight in
their minds; keeping track of two, three, or a dozen, plus all of
their possible interactions, is just plain hard.

Computer scientists began studying ways of running multiple processes
safely and efficiently in a single physical address space in the
mid-1960s. Since then, a rich theory has been developed in which
assertions about the behavior of interacting processes can be
formalized and proved, and entire languages devoted to concurrent and
parallel programming have been created. Foundations of
Multithreaded, Parallel, and Distributed Programming
, by
Gregory R. Andrews (Addison-Wesley), is not only an excellent
introduction to this theory, but also contains a great deal of
historical information tracing the development of major ideas.

Over
the past 20 years, opportunity and necessity have conspired to make
concurrency a part of programmers' everyday lives.
The opportunity is for greater speed, which comes from the growing
availability of multiprocessor machines. In the early 1980s, these
were expensive curiosities; today, many programmers have
dual-processor workstations on their desks and four-way or eight-way
servers in the back room. If a calculation can be broken down into
independent (or nearly independent) pieces, such machines can
potentially solve them two, four, or eight times faster than their
uniprocessor equivalents. While the potential gains from this
approach are limited, it works well for problems as diverse as image
processing, serving HTTP requests, and recompiling multiple source
files.





The necessity for concurrent
programming comes from GUIs and network applications. Graphical
interfaces often need to appear to be doing several things at once,
such as displaying images while scrolling ads across the bottom of
the screen. While it is possible to do the necessary interleaving
manually, it is much simpler to code each operation on its own and
let the underlying operating system decide on a concrete order of
operations. Similarly, network applications often have to listen on
several sockets at once or send data on one channel while receiving
data on another.


Broadly speaking,
operating systems give programmers two kinds of concurrency.
Processes run in separate logical address
spaces that are protected from each other. Using concurrent
processing for performance purposes, particularly in multiprocessor
machines, is more attractive with threads,
which execute simultaneously within the same program, in the same
address space, without being protected from each other. The lack of
mutual protection allows lower overhead and easier and faster
communication, particularly because of the shared address space.
Since all threads run code from the same program, no special security
risks are caused by the lack of mutual protection, any more than the
risks in a single-threaded program. Thus, concurrency used for
performance purposes is most often focused on adding threads to a
single program.




However, adding threads to a Python
program to speed it up is often not a successful strategy. The reason
is the Global Interpreter Lock (GIL), which protects
Python's internal data structures. This lock
must be held by a thread before the thread can
safely access Python objects. Without the lock, even simple
operations (such as incrementing an integer) could fail. Therefore,
only the thread with the GIL can manipulate Python objects or call
Python/C API
functions.


To
make life easier for programmers, the interpreter releases and
reacquires the lock every 100 bytecode instructions (a value that can
be changed using sys.setcheckinterval). The lock
is also released and reacquired around I/O operations, such as
reading or writing a file, so that other threads can run while the
thread that requests the I/O is waiting for the I/O operation to
complete. However, effective performance-boosting exploitation of
multiple processors from multiple pure-Python threads of the same
process is just not in the cards. Unless the CPU performance
bottlenecks in your Python application are in C-coded extensions that
release the GIL, you will not observe substantial performance
increases by moving your multithreaded application to a
multiprocessor machine.



However, threading
is not just about performance on multiprocessor machines. A GUI
can't know when the user will press a key or move
the mouse, and an HTTP server can't know which
datagram will arrive next. Handling each stream of events with a
separate control thread is therefore often the simplest way to cope
with this unpredictability, even on single-processor machines, and
when high throughput is not an overriding concern. Of course,
event-driven programming can often be used in these kinds of
applications as well, and Python frameworks such as
asyncore and Twisted are proof that this approach
can often deliver excellent performance with complexity that, while
different from that inherent in multithreading, is not necessarily
any more difficult to deal
with.



The standard Python library allows
programmers to approach multithreaded programming at two different
levels. The core module, tHRead, is a thin wrapper
around the basic primitives that any threading library must provide.
Three of these primitives are used to create, identify, and end
threads; others are used to create, test, acquire, and release simple
mutual-exclusion locks (or binary semaphores). As the recipes in this
section demonstrate, programmers should avoid using these primitives
directly, and should instead use the tools included in the
higher-level threading module, which is
substantially more programmer-friendly and has similar performance
characteristics.

Whether you use thread or
threading, some underlying aspects of
Python's threading model stay the same. The GIL, in
particular, works just the same either way. The crucial advantage of
the GIL is that it makes it much easier to code Python extensions in
C: unless your C extension explicitly releases the GIL, you know
thread switches won't happen until your C code calls
back into Python code. This advantage can be really important when
your extension makes available to Python some underlying C library
that isn't thread-safe. If your C code
is thread-safe, though, you can and should
release the GIL around stretches of computational or I/O operations
that can last for a substantial time without needing to make Python C
API calls; when you do this, you make it possible for Python programs
using your C extension to take advantage of more than one processor
from multiple threads within the same process. Make sure you acquire
the GIL again before calling any Python C API entry point,
though!


Any time
your code wants to access a data structure that is shared among
threads, you may have to wonder whether a given operation is
atomic, meaning that no thread switch can
happen during the operation. In general,
anything with multiple bytecodes is not atomic, since a thread switch
might always happen between one bytecode and the
next (you can use the standard library function
dis.dis to disassemble Python code into
bytecodes). Moreover, even a single bytecode is not atomic, if it can
call back to arbitrary Python code (e.g., because that bytecode can
end up executing a Python-coded special method). When in doubt, it is
most prudent to assume that whatever is giving you doubts is
not atomic: so, reduce to the bare minimum the
data structures accessed by more than one thread (except for
instances of Queue.Queue, a class that is
specifically designed to be thread-safe!), and make sure you protect
with locks any access to any such structures that
remain.

Almost invariably, the proper idiom to use some lock is:

somelock.acquire( )
try:
# operations needing the lock (keep to a minimum!)
finally:
somelock.release( )

The TRy/finally construct
ensures the lock will be released even if some exception happens in
the code in the try clause. Accidentally failing
to release a lock, due to some unforeseen exception, could soon make
all of your application come to a grinding halt. Also, be careful
acquiring more than one lock in sequence; if you really truly need to
do such multiple acquisitions, make sure all possible paths through
the code acquire the various locks in the same
sequence. Otherwise, you're likely sooner or later
to enter the disaster case in which two threads are each trying to
acquire a lock held by the othera situation known as
deadlock, which does mean that your program is
as good as dead.





The most
important elements of the threading module are
classes that represent threads and various high-level synchronization
constructs. The Thread class represents a separate
control thread; it can be told what to do by passing a callable
object to its constructor, or, alternatively, by overriding its
run method. One thread can start another by
calling its start method, and wait for it to
complete by calling join. Python also supports
daemon threads, which do background processing until all of the
nondaemon threads in the program exit and then shut themselves down
automatically.

The synchronization constructs in the
threading module include locks, reentrant locks
(which a single thread can safely relock many times without
deadlocking), counting semaphores, conditions,
and events. Events can be used by one thread to signal others that
something interesting has happened (e.g., that a new item has been
added to a queue, or that it is now safe for the next thread to
modify a shared data structure). The documentation that comes with
Python, specifically the Library Reference
manual, describes each of these classes in detail.

The relatively low number of recipes in this chapter, compared to
some other chapters in this cookbook, reflects both
Python's focus on programmer productivity (rather
than absolute performance) and the degree to which other packages
(such as httplib and wxPython)
hide the unpleasant details of concurrency in important application
areas. This relative scarcity also reflects many Python
programmers' tendencies to look for the simplest way
to solve any particular problem, which complex threading rarely is.

However, this chapter's brevity may also reflect the
Python community's underappreciation of the
potential of simple threading, when used appropriately, to simplify a
programmer's life. The Queue
module in particular supplies a delightfully self-contained (and yet
extensible and customizable!) synchronization and cooperation
structure that can provide all the interthread supervision services
you need. Consider a typical program, which accepts requests from a
GUI (or from the network). As a
"result" of such requests, the
program will often find itself faced with the prospect of having to
perform a substantial chunk of work. That chunk might take so long to
perform all at once that, unless some precautions are taken, the
program would appear unresponsive to the GUI (or network).

In a purely event-driven architecture, it
may take considerable effort on the programmer's
part to slice up such a hefty work-chunk into slices of work thin
enough that each slice can be performed in idle time, without ever
giving the appearance of unresponsiveness. In cases such as this one,
just a dash of multithreading can help considerably. The main thread
pushes a work request describing the substantial chunk of background
work onto a dedicated Queue instance, then goes
back to its task of making the program's interface
responsive at all times.

At the other end of the Queue, a pool of daemonic
worker threads await, each ready to peel a work request off the
Queue and run it straight through. This kind of
overall architecture combines event-driven and multithreaded
approaches in the overarching ideal of simplicity and is thus
maximally Pythonic. You may need just a little bit more work if the
result of a worker thread's efforts must be
presented again to the main thread (via another
Queue, of course), which is normally the case with
GUIs. If you're willing to cheat just a little, and
use polling for the mostly event-driven main thread to access the
result Queue back from the daemonic worker
threads. See Recipe 11.9,
to get an idea of how simple that little bit of work can be.


/ 394