Recipe 19.9. Looping Through the Cross-Product of Multiple Iterables
Credit: Attila Vàsàrhelyi, Raymond
Hettinger, Steven Taschuk
Problem
You need to loop through every item of multiple iterables
cross-productwise, meaning that you first want to get the first item
of the first iterable paired with all the others, next, the second
item of the first iterable paired with all the others, and so forth.
Solution
Say you have two iterables (lists, in this case) such as:
a = ['a1', 'a2', 'a3']If you want to loop over their cross-product, the simplest approach
b = ['b1', 'b2']
is often just a couple of nested for loops:
for x in a:This snippet's output is six lines:
for y in b:
print x, y
a1 b1However, in many cases, you'd rather get all items
a1 b2
a2 b1
a2 b2
a3 b1
a3 b2
in the "cross-product" of multiple
iterables as a single, linear sequence, suitable for using in a
single for or for passing onwards to other
sequence manipulation functions, such as those supplied by
itertools. For such needs, you may put the nested
fors in a list comprehension:
for x, y in [(x,y) for x in a for y in b]:
print x, y
Discussion
A list comprehension lets you easily generate (as a single, linear
sequence) all the pairings of several iterables (also known as the
cross-product, product
set, or Cartesian product of these
iterables). However, the number of items in such a cross-product is
the arithmetic product (multiplication) of the lengths of all the
iterables involved, a number that may easily get quite large. A list
comprehension, by definition, builds the entire list at once, which
means that it may consume substantial amounts of memory. Also, you
get to start iterating only when the whole cross-product list is
entirely built.Python 2.4 offers one obvious way to solve this problem: the newly
introduced construct of generator expressions:
for x, y in ((x,y) for x in a for y in b): print x, yA generator expression looks just like a list comprehension, except
that it uses parentheses rather than brackets: it returns an
iterator, suitable for looping on, rather than building and returning
a list. Thus, a generator expression can save substantial amounts of
memory, if you are iterating over a very long sequence. Also, you
start executing the loop's body very soon, since
each successive element gets generated iteratively, before each
iteration of the loop's body. If your
loop's body contains conditional
breaks, so that execution terminates as soon as
some conditions are met, using a generator expression rather than a
list comprehension can mean a potentially substantial improvement in
performance.If you need to support Python 2.3, and yet you want to achieve the
kind of advantages that generator expressions can afford over list
comprehensions, the best approach may be to code your own generator.
This is quite simple if you only need to deal with a known number of
sequences, such as two:
def cross_two(a, b):Dealing with an arbitrary number of sequences is a bit more
for x in a:
for y in b:
yield a, b
complicated, but not terribly so, particularly if we use recursion to
help:
def cross_loop(*sequences):We can also do it without recursion. It's not hard
if sequences:
for x in sequences[0]:
for y in cross_loop(sequences[1:]):
yield (x,) + y
else:
yield ( )
if we're willing to build the entire result list in
memory at once before returning it, just as a list comprehension
would:
def cross_list(*sequences):Alternatively, you can return map(tuple, result)
result = [[ ]]
for seq in sequences:
result = [sublist+[item] for sublist in result for item in seq]
return result
if you need to ensure that each item of the sequence you return is a
tuple, not a list.Recursion-free iterative (incremental) generation of the
"cross-product" sequence is also
feasible, even though it's nowhere as simple as
either the recursive or the nonincremental versions:
def cross(*sequences):In Python 2.4, you might express the for statement
# visualize an odometer, with "wheels" displaying "digits"...:
wheels = map(iter, sequences)
digits = [it.next( ) for it in wheels]
while True:
yield tuple(digits)
for i in range(len(digits)-1, -1, -1):
try:
digits[i] = wheels[i].next( )
break
except StopIteration:
wheels[i] = iter(sequences[i])
digits[i] = wheels[i].next( )
else:
break
more clearly as for i in
reversed(range(len(digits))).To repeat, it is important to remember that all of these solutions
should be considered only if you do have the
problemthat is, if and only if you do need to view all items
in the "cross-product" of multiple
iterables as a single, linear sequence. Many cases have no such
requirement, and simply coding multiple nested for
loops inline is quite acceptable, simpler, and more readable. In many
cases, getting all items in the
"cross-product" as a single
sequence is preferable, so it's worth knowing how to
do that. However, do keep in mind that simplicity is an important
virtue, and do not lose sight of it in pursuit of a cool (but
complicated) solution. All the cool tools, constructs, and library
modules that Python offers exist strictly to serve
you, to let you build and maintain your
applications with minimal effort. Don't go out of
your way to use the new shiny tools if you can solve your
application's problems with less effort in simpler
ways!
See Also
The Library Reference and Python in
a Nutshell docs about built-ins iter,
enumerate, map, and (Python 2.4
only) reversed; the Language
Reference and Python in a Nutshell
docs about list comprehensions and (Python 2.4 only) generator
expressions.