Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 19.8. Looping Through Multiple Iterables in Parallel


Credit: Andy McKay, Hamish Lawson, Corey Coughlin


Problem


You need to loop through every item of multiple iterables in
parallel, meaning that you first want to get a tuple with all of the
first items of each iterable, next, a tuple with all of the
"second items", and so forth.


Solution


Say you have two iterables (lists, in this case) such as:

a = ['a1', 'a2', 'a3']
b = ['b1', 'b2']

If you want to loop "in parallel"
over them, the most general and effective approach is:

import itertools
for x, y in itertools.izip(a, b):
print x, y

This snippet outputs two lines:

a1 b1
a2 b2


Discussion


The most general and effective way to loop "in
parallel" over multiple iterables is to use function
izip of standard library module
itertools, as shown in the
"Solution". The built-in function
zip is an alternative that is almost as good:

for x, y in zip(a, b):
print x, y

However, zip has one downside that can hurt your
performance if you're dealing with long sequences:
it builds the list of tuples in memory all at once (preparing and
returning a list), while you need only one tuple at a time for pure
looping purposes.

Both zip and itertools.izip,
when you iterate in parallel over iterables of different lengths,
stop as soon as the "shortest" such
iterable is exhausted. This approach to termination is normally what
you want. For example, it lets you have one or more non-terminating
iterable in the zipping, as long as at least one of the iterables
does terminateor (in the case of izip,
only) as long as you use some control structure, such as a
conditional break within a for
statement, to ensure you always require a finite number of items and
do not loop endlessly.

In some cases, when iterating in parallel over iterables of different
lengths, you may want shorter iterables to be conceptually
"padded" with
None up to the length of the longest iterable in
the zipping. For this special need, you can use the built-in function
map with a first argument of
None:

for x, y in map(None, a, b):
print x, y

map, like zip, builds and
returns a whole list. If that is a problem, you can reproduce
map's pad with
None's behavior by coding your
own generator. Coding your own generator is also a good approach when
you need to pad shorter iterables with some value that is different
from None.

If you need to deal only with specifically two sequences, your
iterator's code can be quite straightforward and
linear:

import itertools
def par_two(a, b, padding_item=None):
a, b = iter(a), iter(b)
# first, deal with both iterables via izip until one is exhausted:
for x in itertools.izip(a, b):
yield x
# only one of the following two loops, at most, will execute, since
# either a or b (or both!) are exhausted at this point:
for x in a:
yield x, padding_item
for x in b:
yield padding_item, x

Alternatively, you can code a more general function, one that is able
to deal with any number of sequences:

import itertools
def par_loop(padding_item, *sequences):
iterators = map(iter, sequences)
num_remaining = len(iterators)
result = [padding_item] * num_remaining
while num_remaining:
for i, it in enumerate(iterators):
try:
result[i] = it.next( )
except StopIteration:
iterators[i] = itertools.repeat(padding_item)
num_remaining -= 1
result[i] = padding_item
if num_remaining:
yield tuple(result)

Here's an example of use for generator
par_loop:

print map(''.join, par_loop('x', 'foo', 'zapper', 'ui'))
# emits: ['fzu', 'oai', 'opx', 'xpx', 'xex', 'zrx']

Both par_two and par_loop start by
calling the built-in function iter on all of their
arguments and thereafter use the resulting iterators. This is
important, because the functions rely on the
state that these iterators maintain. The key
idea in par_loop is to keep count of the number of
iterators as yet unexhausted, and replace each exhausted iterator
with a nonterminating iterator that yields the
padding_item ceaselessly;
num_remaining counts unexhausted iterators, and both
the yield statement and the continuation of the
while loop are conditional on
some iterators being as yet unexhausted.

Alternatively, if you know in advance which iterable is the longest
one, you can wrap every other iterable x
as
itertools.chain(iter(x),
itertools.repeat(padding))
and then call
itertools.izip. You can't do this
wrapping on all iterables because the resulting
iterators are nonterminatingif you izip
iterators that are all nonterminating, izip itself
cannot terminate! Here, for example, is a version that works as
intended only when the longest (but terminating!) iterable is the
very first one:

import itertools
def par_longest_first(padding_item, *sequences):
iterators = map(iter, sequences)
for i, it in enumerate(iterators):
if not i: continue
iterators[i] = itertools.chain(it, itertools.repeat(padding_item))
return itertools.izip(iterators)


See Also


The itertools module is part of the Python
Standard Library and is documented in the Library
Reference
portion of Python's online
documentation; the Library Reference and
Python in a Nutshell docs about built-ins
zip, iter, and
map.


/ 394