Recipe 19.18. Looking Ahead into an Iterator
Credit: Steven Bethard, Peter Otten
Problem
You are using an iterator for some task such as parsing, which
requires you to be able to "look
ahead" at the next item the iterator is going to
yield, without disturbing the iterator state.
Solution
The best solution is to wrap your original iterator into a suitable
class, such as the following one (Python 2.4-only):
import collections
class peekable(object):
"" An iterator that supports a peek operation. Example usage:
>>> p = peekable(range(4))
>>> p.peek( )
0
>>> p.next(1)
[0]
>>> p.peek(3)
[1, 2, 3]
>>> p.next(2)
[1, 2]
>>> p.peek(2)
Traceback (most recent call last):
...
StopIteration
>>> p.peek(1)
[3]
>>> p.next(2)
Traceback (most recent call last):
...
StopIteration
>>> p.next( )
3
""
def _ _init_ _(self, iterable):
self._iterable = iter(iterable)
self._cache = collections.deque( )
def _ _iter_ _(self):
return self
def _fillcache(self, n):
if n is None:
n = 1
while len(self._cache) < n:
self._cache.append(self._iterable.next( ))
def next(self, n=None):
self._fillcache(n)
if n is None:
result = self._cache.popleft( )
else:
result = [self._cache.popleft( ) for i in range(n)]
return result
def peek(self, n=None):
self._fillcache(n)
if n is None:
result = self._cache[0]
else:
result = [self._cache[i] for i in range(n)]
return result
Discussion
Many iterator-related tasks, such as parsing, require the ability to
"peek ahead" (once or a few times)
into the sequence of items that an iterator is yielding, in a way
that does not alter the iterator's observable state.
One approach is to use the new Python 2.4 function
iterator.tee to get two independent copies of the
iterator, one to be advanced for peeking purposes and the
"other" one to be used as the
"main" iterator.
It's actually handier to wrap the incoming iterator
once for all, at the start, with the class peekable
presented in this recipe; afterwards, a peek method,
which is safe and effective, can be counted on. A little added
sweetener is the ability to call peek (and, as long
as we're at it, the standard next
method too) with a specific number argument
n, to request a list of the next
n items of the iterator (without
disturbing the iterator's state when you call
peek(n), with iterator state advancement when you
call
next(n)just
like for normal calls without arguments to the same methods).The obvious idea used in this recipe for implementing
peekable is to have it keep a cache of peeked-ahead
arguments. Since the cache must grow at the tail and get consumed
from the end, a natural choice is to make the cache a
collections.deque, a new type introduced in Python
2.4. However, if you need this code to run under version 2.3 as well,
make self._cache a list
insteadyou only need to change method
next's body a little bit, making
it:
if n is None:As long as you're caching only one or just a few
result = self._cache.pop(0)
else:
result, self_cache = self._cache[:n], self._cache[n:]
items of lookahead at a time, performance won't
suffer much by making self._cache a
list rather than a deque.An interesting characteristic of the peekable class
presented in this recipe is that, if you request too many items from
the iterator, you get a StopIteration exception
but that does not throw away the last few values of the iterator. For
example, if p is an instance of
peekable with just three items left, when you call
p.next(5), you get a
StopIteration exception. You can later call
p.next(3) and get the list of the last three
items.A subtle point is that the n argument to
methods peek and next defaults to
None, not to 1. This gives you
two distinct ways to peek at a single item: the default way, calling
p.peek( ), just gives you that item, while calling
p.peek(1) gives you a list with that single item
in it. This behavior is quite consistent with the way
p.peek behaves when called with different
arguments: any call p.peek(n) with any
non-negative integer n returns a list with
n items (or raises
StopIteration if p has
fewer than n items left). This approach
even supports calls such as p.next(0), which in
practice always returns an empty list [ ] without
advancing the iterator's state. Typically, you just
call p.peek( ), without arguments, and get one
look-ahead item without problems.As an implementation detail, note that the docstring of the class
peekable presented in this recipe is essentially
made up of examples of use with expected results. Besides being
faster to write, and arguably to read for an experienced Pythonista,
this style of docstring is perfect for use with the Python Standard
Library module doctest.
See Also
collections.deque and doctest
in the Python Library
Reference (for Python 2.4).