Recipe 4.10. Adding an Entry to a Dictionary
Credit: Alex Martelli, Martin Miller, Matthew
Shomphe
Problem
Working with a
dictionary d, you need to use the entry
d[k] when it's already present,
or add a new value as d[k] when
k isn't yet a key in
d.
Solution
This is what the
setdefault method of dictionaries is for. Say
we're building a word- to-page-numbers index, a
dictionary that maps each word to the list of page numbers where it
appears. A key piece of code in that application might be:
def addword(theIndex, word, pagenumber):This code is equivalent to more verbose approaches such as:
theIndex.setdefault(word, [ ]).append(pagenumber)
def addword(theIndex, word, pagenumber):and:
if word in theIndex:
theIndex[word].append(pagenumber)
else:
theIndex[word] = [pagenumber]
def addword(theIndex, word, pagenumber):Using method setdefault simplifies this task
try:
theIndex[word].append(pagenumber)
except KeyError:
theIndex[word] = [pagenumber]
considerably.
Discussion
For any dictionary d,
d.setdefault(k, v) is very similar to
d.get(k, v), which was covered previously in
Recipe 4.9. The essential
difference is that, if k is not a key in
the dictionary, the setdefault method assigns
d[k]=v as a side effect, in addition to returning
v. (get would just
return v, without affecting
d in any way.) Therefore, consider using
setdefault any time you have
get-like needs, but also want to produce this side
effect on the dictionary.setdefault is particularly useful in a dictionary
with values that are lists, as detailed in Recipe 4.15. The most typical usage for
setdefault is something like:
somedict.setdefault(somekey, [ ]).append(somevalue)setdefault is not all that useful for immutable
values, such as numbers. If you just want to count words, for
example, the right way to code is to use, not
setdefault, but rather get:
theIndex[word] = theIndex.get(word, 0) + 1since you must rebind the
dictionary entry at theIndex[word] anyway (because
numbers are immutable). But for our word-to page-numbers example, you
definitely do not want to fall into the
performance trap that's hidden in the following
approach:
def addword(theIndex, word, pagenumber):This latest version of addword builds three new
theIndex[word] = theIndex.get(word, [ ]) + [pagenumber]
lists each time you call it: an empty list that's
passed as the second argument to theIndex.get, a
one-item list containing just pagenumber,
and a list with N+1 items obtained by
concatenating these two (where N is the
number of times that word was previously
found). Building such a huge number of lists is sure to take its
toll, in performance terms. For example, on my machine, I timed the
task of indexing the same four words occurring once each on each of
1,000 pages. Taking the first version of addword in
the recipe as a reference point, the second one (using
try/except) is about 10%
faster, the third one (using setdefault) is about
20% slowerthe kind of performance differences that you should
blissfully ignore in just about all cases. This fourth version (using
get) is four times
slowerthe kind of performance difference you just
can't afford to ignore.
See Also
Recipe 4.9; Recipe 4.15; Library
Reference and Python in a Nutshell
documentation about dict.