Recipe 19.7. Looping on a Sequence by Overlapping Windows
Credit: Peter Cogolo, Steven Bethard, Ian Bicking
Problem
You have an iterable s and need to make
another iterable whose items are sublists (i.e., sliding
windows), each of the same given length, over
s' items, with successive
windows overlapping by a specified amount.
Solution
We can combine built-in function iter and function
islice from the standard library module
itertools to code a generator to solve our
problem:
import itertoolsThis module, when run as a main script, emits:
def windows(iterable, length=2, overlap=0):
it = iter(iterable)
results = list(itertools.islice(it, length))
while len(results) == length:
yield results
results = results[length-overlap:]
results.extend(itertools.islice(it, length-overlap))
if results:
yield results
if _ _name_ _ == '_ _main_ _':
seq = 'foobarbazer'
for length in (3, 4):
for overlap in (0, 1):
print '%d %d: %s' % (length, overlap,
map(''.join, windows(seq, length, overlap)))
3 0: ['foo', 'bar', 'baz', 'er']When you know you don't need any overlap, a fast and
3 1: ['foo', 'oba', 'arb', 'baz', 'zer', 'r']
4 0: ['foob', 'arba', 'zer']
4 1: ['foob', 'barb', 'baze', 'er']
concise alternative is available:
def chop(iterable, length=2):The body of this concise alternative may be a bit confusing until you
return itertools.izip(*(iter(iterable),)*length)
realize that the two occurrences of the asterisk ( *
) there play different roles: the first one is part of a
*args syntax form (passing the elements of a
sequence as separate positional arguments), the second one indicates
that a sequence (the Singleton tuple
(iter(iterable),) must be repeated
length times.
Discussion
In many cases, we need a sequence of sub-sequences of a given length,
and we have to start with a "flat"
iterable. For example, we can build a dictionary with given keys and
values by calling dict with a sequence of two-item
sequencesbut what if we start with a
"flat" sequence where keys and
values just alternate? The function windows in this
recipe meets this need:
the_dict = dict(windows(flat_alternating_keys_and_values))Or, say we have an iterable whose items are the amounts of sales on
each day. To turn it into an iterable whose items are the amounts of
sales in each week (seven days):
weekly_sales = itertools.imap(sum, windows(daily_sales, 7))The two use cases just presented are examples of how
windows can be useful when called without overlap
(in other words, with an overlap argument of
0, its default value), so the alternative
chop function also presented in the recipe would be
just as good (and faster). However, overlap is often useful when you
deal with iterables that are signals, or time series. For example, if
you have a function average such as:
def average(sequence):then you can apply a simple low-pass filter to a signal:
return sum(sequence)/float(len(sequence))
filtered = itertools.imap(average, windows(raw_signal, 5, 2))or get the moving average daily sales from the iterable of daily
sales:
mvavg_daily_sales = itertools.imap(average, windows(daily_sales, 7, 6))The implementation of the windows generator in this
recipe is quite straightforward, if you're familiar
with itertools.islice (and you should be, if you
deal with iterables!). For the first
"window", we must clearly fill list
results with the appropriate number of items
(islice does that for us). At each subsequent
step, we must throw away a certain number of items from the
"front" of results
(we do that conveniently by list slicing, since
results is, indeed, a list) and
replenish the same number at the back (we do that by calling the
extend method of results, with
islice providing the needed
"new" items). That number, as a
little reasoning shows, is exactly that given by the expression
length-overlap. The loop terminates, if ever, only
when results cannot be entirely replenished. (The
loop never executes if results cannot even be filled
entirely in the first place.)When the loop terminates, we may be left with a
"tail" in results,
a "last window" that is shorter
than length. What to do in that case depends on your
application's exact needs. The recipe, as given
above, just yields the shorter window as the last item of the
generator, which is what we'd probably want in all
of the previously mentioned use cases. In other cases, we might want
to drop the last, too-short window (just omit the last two statements
in function windows as given in the recipe), raise
an exception (when we know that such a situation should never occur),
or pad the last window to the required length with a pad value such
as None, by changing the last two statements in
function windows to something like:
if result:One important implementation detail: function
result.extend(itertools.repeat(None, length-len(result)))
yield result
windows, as coded in the recipe, yields a new list
object at each step. It takes some time to generate all of these
objects. In some cases, it may be convenient to the caller to know
that each object it gets is a separate and independent list. Such
knowledge enables the caller to store or modify the objects it gets,
without having to make explicit copies. However, none of the use
cases we discussed gets any benefit from this feature. So, you could
optimize, by yielding the same list object every time. If you want
that optimization, just change the statement:
results = results[length-overlap:]into:
del results[:length-overlap]If you're applying this optimization, and
you're using Python 2.4, you should also consider
using the new type collections.deque instead of
list. In order to do so, you need to add the
statement:
import collectionsat the start of your code, change the only occurrence of
list in the recipe into
collections.queue, and further change the updating
of results to avoid slicing, using, instead:
for i in xrange(length-overlap): results.popleft( )If your windows are long, and if they overlap substantially, using
deque instead of list might
give you better performance, since deque is
optimized to support adding and removing items at
either end, while lists,
being compact arrays in memory, are inherently slow (specifically,
O(n)
for a list of length n) when you add or
remove items at the beginning.When you want windows of some length n
that overlap specifically by n-1 items,
function itertools.tee, new in Python 2.4, offers
an elegant alternative approach. Say that you want to look at each
item of the iterable, with access to a few neighboring items and some
padding at both ends, so that you get just as many windows as there
are items in the iterable. In Python 2.4, you could then code:
import itertools as ITFor example:
def windowed(iterable, pre=1, post=1, padding=None):
# tee-off one iterator for each index in the window
copies = IT.tee(iterable, pre + 1 + post)
pre_copies, copy, post_copies = copies[:pre], copies[pre], copies[pre+1:]
# iterators before the element have their start filled in with the
# padding value. no need to slice off the ends, izip will do that.
pre_copies = [IT.chain(IT.repeat(padding, pre - i), itr)
for i, itr in enumerate(pre_copies)]
# iterators after the element have their starts sliced off and their
# end filled in with the padding value, endlessly repeated.
post_copies = [IT.chain(IT.islice(itr, i + 1, None), IT.repeat(padding))
for i, itr in enumerate(post_copies)]
# zip the elements with their preceding and following elements
return IT.izip(*(pre_copies + [copy] + post_copies))
>>> print list(windowed(xrange(4), 1, 2, 'x'))If you use Python 2.4 and want this flavor of
[('x', 0, 1, 2), (0, 1, 2, 3), (1, 2, 3, 'x'), (2, 3, 'x', 'x')]
"sliding windows" over the
iterable, with specified "padding"
at both ends, you might prefer this windowed
function to the recipe's windows
generator.
See Also
Library Reference documentation on built-in
iter and module itertools.