Recipe 4.14. Inverting a Dictionary
Credit: Joel Lawhead, Ian Bollinger, Raymond
Hettinger
Problem
An
existing dict maps keys to unique values, and you
want to build the inverse dict, mapping each value
to its key.
Solution
You can write a function that passes a list comprehension as
dict's argument to build the new
requested dictionary:
def invert_dict(d):For large dictionaries, though, it's faster to use
return dict([ (v, k) for k, v in d.iteritems( ) ])
the generator izip from the
itertools module in the Python Standard
Library:
from itertools import izip
def invert_dict_fast(d):
return dict(izip(d.itervalues( ), d.iterkeys( )))
Discussion
If the values in dict d are not unique,
then d cannot truly be inverted, meaning
that there exists no dict id such that for any
valid key k,
id[d[k]]==k. However, the functions shown in this
recipe still construct, even in such cases, a
"pseudo-inverse" dict
pd such that, for any
v that is a value in
d, d[pd[v]]==v. Given
the original dict d and the dict
x returned by either of the functions
shown in this recipe, you can easily check whether
x is the true inverse of
d or just
d's pseudo-inverse:
x is the true inverse of
d if and only if
len(x)==len(d). That's because,
if two different keys have the same value, then, in the result of
either of the functions in this recipe, one of the two keys will
simply go "poof" into the ether,
thus leaving the resulting pseudo-inverse dict shorter than the dict
you started with. In any case, quite obviously, the functions shown
in this recipe can work only if all values in
d are hashable (meaning that they are all
usable as keys into a dict): otherwise, the functions raise a
TypeError exception.When we program in Python, we normally "disregard
minor optimizations," as Donald Knuth suggested over
thirty years ago: we place a premium on clarity and correctness and
care relatively little about speed. However, it
can't hurt to know about faster possibilities: when
we decide to code in a certain way because it's
simpler or clearer than another, it's best if we are
taking the decision deliberately, not out of ignorance.Here, function invert_dict in this
recipe's Solution might perhaps be considered
clearer because it shows exactly what it's doing.
Take the pairs k,
v of key and value that method
iteritems yields, swap them into (value,
key) order, and feed the resulting list as the argument of
dict, so that dict builds a
dictionary where each value v is a key and
the corresponding key k becomes that
key's valuejust the inverse
dict that our problem
requires.However, function invert_dict_fast, also in this
recipe's Solution, isn't really any
more complicated: it just operates more abstractly, by getting all
keys and all values as two separate iterators and zipping them up
(into an iterator whose items are the needed, swapped
(value, key) pairs) via a call to generator
izip, supplied by the itertools
module of the Python Standard Library. If you get used to such higher
abstraction levels, they will soon come to feel
simpler than lower-level code!Thanks to the higher level
of abstraction, and to never materializing the whole list of pairs
(but rather operating via generators and iterators that yield only
one item at a time), function invert_dict_fast can
be substantially faster than function invert_dict.
For example, on my machine, to invert a 10,000-item dictionary,
invert_dict takes about 63 milliseconds, but
invert_dict_fast manages the same task in just 20
milliseconds. A speed increase by a factor of three, in general, is
not to be sneered at. Such performance gains, when you work on large
amounts of data, are the norm, rather than the exception, for coding
at higher abstraction levels. This is particularly true when you can
use itertools rather than loops or list
comprehensions, because you don't need to
materialize some large list in memory at one time. Performance gain
is an extra incentive for getting familiar with working at higher
abstraction levels, a familiarity that has conceptual and
productivity pluses, too.
See Also
Documentation on mapping types and itertools in
the Library Reference and Python in
a Nutshell; Chapter 19.