Recipe 4.17. Finding Unions and Intersections of Dictionaries
Credit: Tom Good, Andy McKay, Sami Hangaslammi, Robin
Siebler
Problem
Given two dictionaries, you
need to find the set of keys that are in both
dictionaries (the intersection) or the set of keys that are in
either dictionary (the union).
Solution
Sometimes, particularly in Python 2.3, you find yourself using
dictionaries as concrete representations of sets. In such cases, you
only care about the keys, not the corresponding values, and often you
build the dictionaries by calls to dict.fromkeys,
such as
a = dict.fromkeys(xrange(1000))The fastest way to compute the dict that is the
b = dict.fromkeys(xrange(500, 1500))
set-union is:
union = dict(a, **b)The fastest concise way to compute the dict that
is the set-intersection is:
inter = dict.fromkeys([x for x in a if x in b])If the number of items in
dictionaries a and
b can be very different, then it can be
important for speed considerations to have the shorter one in the
for clause, and the longer one in the
if clause, of this list comprehension. In such
cases, it may be worth sacrificing some conciseness in favor of
speed, by coding the intersection computation as follows:
if len(a) < len(b):Python also gives
inter = dict.fromkeys([x for x in a if x not in b])
else:
inter = dict.fromkeys([x for x in b if x not in a])
you types to represent sets directly (in standard library module
sets, and, in Python 2.4, also as built-ins). Here
is a snippet that you can use at the start of a module: the snippet
ensures that name set is bound to the best
available set type, so that throughout the module, you can then use
the same code whether you're using Python 2.3 or
2.4:
try:Having done this, you can now use type set to best
set
except NameError:
from sets import Set as set
effect, gaining clarity and conciseness, and (in Python 2.4) gaining
a little speed, too:
a = set(xrange(1000))
b = set(xrange(500, 1500))
union = a | b
inter = a & b
Discussion
In Python 2.3,
even though the Python Standard Library module
sets offers an elegant data type
Set that directly represents a set (with hashable
elements), it is still common to use a dict to
represent a set, partly for historical reasons. Just in case you want
to keep doing it, this recipe shows you how to compute unions and
intersections of such sets in the fastest ways, which are not
obvious. The code in this recipe, on my machine, takes about 260
microseconds for the union, about 690 for the intersection (with
Python 2.3; with Python 2.4, 260 and 600,respectively), while
alternatives based on loops or generator expressions are
substantially slower.However, it's best to use type
set instead of representing sets by dictionaries.
As the recipe shows, using set makes your code
more direct and readable. If you dislike the or-operator
(|) and the
"and-operator"
(&), you can equivalently use
a.union(b) and
a.intersection(b), respectively. Besides clarity,
you also gain speed, particularly in Python 2.4: computing the union
still takes about 260 microseconds, but computing the intersection
takes only about 210. Even in Python 2.3, this approach is acceptably
fast: computing the union takes about 270 microseconds, computing the
intersection takes about 650not quite as fast as Python 2.4
but still quite comparable to what you can get if you represent sets
by dictionaries. Last but not least, once you use type
set (whether it is the Python 2.4 built-in, or
class Set from the Python Standard Library module
sets, the interface is the same), you gain a
wealth of useful set operations. For example, the set of elements
that are in either a or
b but not both is a^b
or, equivalently, a.symmetric_difference(b).Even if you start with dicts for other reasons,
consider using sets anyway if you need to perform
set operations. Say, for example, that you have in
phones a dictionary that maps names to
phone numbers and in addresses one that
maps names to addresses. The clearest and simplest way to print all
names for which you know both address and phone number, and their
associated data, is:
for name in set(phones) & set(addresses):This is much terser, and arguably clearer, than something like:
print name, phones[name], addresses[name]
for name in phones:Another excellent alternative is:
if name in addresses:
print name, phones[name], addresses[name]
for name in set(phones).intersection(addresses):If you
print name, phones[name], addresses[name]
use the named intersection method, rather than the
& intersection operator, you
don't need to turn both dicts
into sets: just one of them. Then call
intersection on the resulting
set, and pass the other dict as
the argument to the intersection method.
See Also
The Library Reference and Python in
a Nutshell sections on mapping types, module
sets, and Python 2.4's built-in
set type.