Recipe 19.2. Building a List from Any Iterable
Credit: Tom Good, Steve Alexander
Problem
You have an iterable object x (it might be
a sequence or any other kind of object on which you can iterate, such
as an iterator, a file, a dict)
and need a list object
y, with the same items as
x and in the same order.
Solution
When you know that iterable object x is
bounded (so that, e.g., a loop for item
in x would surely terminate),
building the list object you require is trivial:
y = list(x)However, when you know that x is
unbounded, or when you are not sure, then you must ensure termination
before you call list. In particular, if you want
to make a list with no more than n items
from x, then standard library module
itertools' function
islice does exactly what you need:
import itertools
y = list(itertools.islice(x, N))
Discussion
Python's generators, iterators, and sundry other
iterables, are a wondrous thing, as this entire chapter strives to
point out. The powerful and generic concept of
iterable is a great way to represent all sort
of sequences, including unbounded ones, in ways that can potentially
save you huge (and even infinite!) amounts of memory. With the
standard library module itertools, generators you
can code yourself, and, in Python 2.4, generator expressions, you can
perform many manipulations on completely general iterables.However, once in a while, you need to build a good old-fashioned
full-fledged list object from such a generic
iterable. For example, building a list is the simplest way to sort or
reverse the items in the iterable, and lists have many other useful
methods you may want to apply. As long as you know for sure that the
iterable is bounded (i.e., has a finite number
of items), just call list with the iterable as the
argument, as the "Solution" points
out. In particular, avoid the goofiness of misusing a list
comprehension such as
[i for
i in
x], when
list(x)
is faster, cleaner, and more readable!Calling list won't help if
you're dealing with an
unbounded iterable. The need to ensure that
some iterable x is bounded also arises in
many other contexts, besides that of calling
list(x):
all "accumulator" functions
(sum(x),
max(x),
etc.) intrinsically need a bounded-iterable argument, and so does a
statement such as for
i in
x (unless you have appropriate
conditional breaks within the
loop's body), a test such as if
i in
x, and so on.If, as is frequently the case, all you want is to ensure that no more
than n items of iterable
x are taken, then
itertools.islice, as shown in the
"Solution", does just what you
need. The islice function of the standard library
itertools module offers many other possibilities
(essentially equivalent to the various possibilities that slicing
offers on sequences), but out of all of them, the simple
"truncation" functionality (i.e.,
take no more than n items) is by far the
most frequently used. The programming language Haskell, from which
Python took many of the ideas underlying its list comprehensions and
generator expression functionalities, has a built-in
take function to cater to this rather frequent
need, and itertools.islice is most often used as
an equivalent to Haskell's built-in
take.In some cases, you cannot specify a maximum number of items, but you
are able to specify a generic condition that you
know will eventually be satisfied by the items of iterable
x and can terminate the proceedings.
itertools.takewhile lets you deal with such cases
in a very general way, since it accepts the controlling predicate as
a callable argument. For example:
y = list(itertools.takewhile((11)._ _cmp_ _, x))binds name y to a new list made up of the
sequence of items in iterable x up to, but
not including, the first one that equals 11. (The
reason we need to code (11)._ _cmp_ _ with
parentheses is a somewhat subtle one: if we wrote 11._ _cmp_
_ without parentheses, Python would
parse 11. as a floating-point literal, and the
entire construct would be syntactically invalid. The parentheses are
included to force the tokenization we mean, with
11 as an integer literal and the period
indicating an access to its attribute, in this case, bound
method _ _cmp_ _.)For the special and frequent case in which the terminating condition
is the equality of an item to some given value, a useful alternative
is to use the two-arguments variant of the built-in function
iter:
y = list(iter(iter(x).next, 11))Here, the iter(x) call (which is innocuous if
x is already an iterator) gives us an
object on which we can surely access callable (bound method)
nextwhich is necessary, because
iter in its two-arguments form requires a callable
as its first argument. The second argument is the
sentinel value, meaning the value that
terminates the iteration as soon as an item equal to it appears. For
example, if x were a sequence with items
1, 6, 3, 5, 7, 11, 2, 9, . . , y would now
be the list [1, 6, 3, 5, 7]. (The sentinel
value itself is excluded: from the beginning, included, to the end,
excluded, is the normal Python convention for just about all loops,
implicit or explicit.)
See Also
Library Reference documentation on built-ins
list and iter, and module
itertools.