Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Recipe 19.17. Duplicating an Iterator

Credit: Heiko Wundram, Raymond Hettinger

Problem

You have an iterator (or other iterable) object
x, and need to iterate twice over
x's sequence of values.

Solution

In Python 2.4, solving this problem is the job of function
tee in the standard library module
itertools:

import itertools
x1, x2 = itertools.tee(x)
# you can now iterate on x1 and x2 separately

In Python 2.3, you can code tee yourself:

import itertools
def tee(iterable):
def yield_with_cache(next, cache={  }):
pop = cache.pop
for i in itertools.count( ):
try:
yield pop(i)
except KeyError:
cache[i] = next( )
yield cache[i]
it = iter(iterable)
return yield_with_cache(it.next), yield_with_cache(it.next)

Discussion

The need to iterate repeatedly over the same sequence of values is a
reasonably common one. If you know that the sequence comes from a
list, or some other container that intrinsically lets you iterate
over its items repeatedly, then you simply perform the iteration
twice. However, sometimes your sequence may come from a generator, a
sequential file (which might, e.g., wrap a stream of data coming from
a network socketdata that you can read only once), or some
other iterator that is not intrinsically re-iterable. Even then, in
some cases, the best approach is the simplest onefirst save
the data into a list in memory, then repeatedly iterate over that
list:

saved_x = list(x)
for item in saved_x: do_something(item)
for item in saved_x: do_something_else(item)

The simple approach of first saving all data from the iterator into a
list is not feasible for an infinite sequence
x, and may not be optimal if
x is very large and your separate
iterations over it never get far out-of-step from each other. In
these cases, the tee function shown in this recipe
can help. For example, say that the items of
x are either numbers or operators (the
latter being represented as strings such as '+',
'*', etc.). Whenever you encounter an operator,
you must output the result of applying that operator to all numbers
immediately preceding it (since the last operator). Using
tee, you could code:

def is_operator(item):
return isinstance(item, str)
def operate(x):
x1, x2 = tee(iter(x))
while True:
for item in x1:
if is_operator(item): break
else:
# we get here when there are no more operators in the input
# stream, thus the operate function is entirely done
return
if item == '+':
total = 0
for item in x2:
if is_operator(item): break
total += item
yield total
elif item == '*':
total = 1
for item in x2:
if is_operator(item): break
total *= item
yield total

This kind of "look-ahead" usage is
pretty typical of many of the common use cases of
tee. Even in this case, you might choose the
alternative approach of accumulating items in a list:

def operate_with_auxiliary_list(x):
aux = [  ]
for item in x:
if is_operator(item):
if item == '+':
yield sum(aux)
elif item == '*':
total = 1
for item in aux:
total *= item
yield total
aux = [  ]
else:
aux.append(item)

Having tee available lets you freely choose
between these different styles of look-ahead processing.

Function itertools.tee as implemented in Python
2.4 is faster and more general than the pure Python version given in
this recipe for version 2.3 usage. However, the pure Python version
is quite instructive and deserves study for the sake of the
techniques it demonstrates, even if you're lucky
enough to be using Python 2.4 and therefore don't
need to use this pure Python version of tee.

In the pure Python version of tee, the nested
generator yield_with_cache makes use of the fact
(which some consider a "wart" in
Python but is actually quite useful) that the default values of
arguments get computed just once, at the time the
def statement executes. Thus, both calls to the
nested generator in the return statement of
tee implicitly share the same initially empty
dict as the value of the cache
argument.

itertools.count returns non-negative integers,
0 and up, one at a time.
yield_with_cache uses each of these integers as a
key into the cache dictionary. The call to
pop(i) (the argument of the
yield statement in the try
clause) simultaneously returns and removes the entry corresponding to
key i, if that entry
was presentthat is, in this case, if the
"other" instance of the generator
had already reached that point in the iteration (and cached the item
for our benefit). Otherwise, the except clause
executes, computes the item (by calling the object bound to name
next, which in this case is the
next bound method of an iterator over the iterable
object, which tee is duplicating), and caches the
item (for the "other"
instance's future benefit) before
yielding it.

So, in practice, cache is being used as a FIFO
queue. Indeed, were it not for the fact that we
don't need a pure-Python tee in
Python 2.4, we could code an equivalent implementation of it in
Python 2.4 using the new type deque in standard
library module collections:

import collections
def tee_just_an_example(iterable):
def yield_with_cache(it, cache=collections.deque):
while True:
if cache:
yield cache.popleft( )
else:
result = it.next( )
cache.append(result)
yield result
it = iter(iterable)
return yield_with_cache(it), yield_with_cache(it)

This latest version is meant purely as an illustrative example, and
therefore, it's simplified by not using any of the
bound-method extraction idioms shown in the version in the
"Solution" (which
is intended for
"production" use in Python 2.3).

Once you've called tee on an
iterator, you should no longer use the original iterator anywhere
else; otherwise, the iterator could advance without the knowledge of
the tee-generated objects, and those objects would
then "get out of sync" with the
original. Be warned that tee requires auxiliary
storage that is proportional to how much the two
tee-generated objects get
"apart" from each other in their
separate iterations. In general, if one iterator is going to walk
over most or all of the data from the original before the
"other" one starts advancing, you
should consider using list instead of
tee. Both of these caveats apply to the
itertools.tee function of Python 2.4 just as well
as they apply to the pure Python versions of tee
presented in this recipe. One more caveat: again both for the
versions in this recipe, and the itertools.tee
function in Python 2.4, there is no guarantee of thread safety: to
access the tee'd iterators from different threads,
you need to guard those iterators with a single lock!

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی