Recipe 7.7. Mutating Objects with shelve
Credit: Luther Blissett
Problem
You are using the standard
module shelve. Some of the values you have shelved
are mutable objects, and you need to mutate these objects.
Solution
The shelve module offers a kind of persistent
dictionaryan important niche between the power of
relational-database engines and the simplicity of
marshal, pickle,
dbm, and similar file formats. However, you should
be aware of a typical trap you need to avoid when using
shelve. Consider the following interactive Python
session:
>>> import shelveWe've created the shelve file,
>>> # Build a simple sample shelf
>>> she = shelve.open('try.she', 'c')
>>> for c in 'spam': she[c] = {c:23}
...
>>> for c in she.keys( ): print c, she[c]
...
p {'p': 23}
s {'s': 23}
a {'a': 23}
m {'m': 23}
>>> she.close( )
added some data to it, and closed it. Goodnow we can reopen it
and work with it:
>>> she=shelve.open('try.she', 'c')What's going on here? We just set the value to 42,
>>> she['p']
{'p': 23}
>>> she['p']['p'] = 42
>>> she['p']
{'p': 23}
but our setting didn't take in
the shelve object! The problem is that we were
working with a temporary object that shelve gave
us, not with the "real thing".
shelve, when we open it with default options, like
here, doesn't track changes to such temporary
objects. One reasonable solution is to bind a name to this temporary
object, do our mutation, and then assign the mutated object back to
the appropriate item of shelve:
>>> a = she['p']We can verify that the change was properly persisted:
>>> a['p'] = 42
>>> she['p'] = a
>>> she['p']
{'p': 42}
>>> she.close( )
>>> she=shelve.open('try.she','c')A simpler solution is to open the shelve object
>>> for c in she.keys( ): print c,she[c]
...
p {'p': 42}
s {'s': 23}
a {'a': 23}
m {'m': 23}
with the writeback option set to
TRue:
>>> she = shelve.open('try.she', 'c', writeback=True)The writeback option instructs
shelve to keep track of all the objects it gets
from the file and write them all back to the file before closing it,
just in case they have been modified in the meantime. While simple,
this approach can be quite expensive, particularly in terms of memory
consumption. Specifically, if we read many objects from a
shelve object opened with
writeback=True, even if we only modify a few of
them, shelve is going to keep them
all in memory, since it can't
tell in advance which one we may be about to modify. The previous
approach, where we explicitly take responsibility to notify
shelve of any changes (by assigning the changed
objects back to the place they came from), requires more care on our
part, but repays that care by giving us much better performance.
Discussion
The standard Python module shelve can be quite
convenient in many cases, but it hides a potentially nasty trap,
admittedly well documented in Python's online docs
but still easy to miss. Suppose you're shelving
mutable objects, such as dictionaries or lists. Naturally, you are
quite likely to want to mutate some of those objectsfor
example, by calling mutating methods (append on a
list, update on a dictionary, etc.) or by
assigning a new value to an item or attribute of the object. However,
when you do this, the change doesn't occur in the
shelve object. This is because we actually mutate
a temporary object that the shelve object has
given us as the result of
shelve's own _ _getitem_
_ method, but the shelve object, by
default, does not keep track of that temporary object, nor does it
care about it once it returns it to us.As shown in the recipe, one solution is to bind a name to the
temporary object obtained by keying into the shelf, doing whatever
mutations are needed to the object via the name, then assigning the
newly mutated object back to the appropriate item of the
shelve object. When you assign to a
shelve object's item, the
shelve object's _
_setitem_ _ method gets invoked, and it appropriately
updates the shelve object itself, so that the
change does occur.Alternatively, you can add the flag writeback=True
at the time you open the shelve object, and then
shelve keeps track of every object it hands you,
saving them all back to disk at the end. This approach may save you
quite a bit of fussy and laborious coding, but take care: if you read
many items of the shelve object and only modify a
few of them, the writeback approach can be
exceedingly costly, particularly in terms of memory consumption. When
opened with writeback=True,
shelve will keep in memory
any item it has ever handed you, and save them all to disk at the
end, since it doesn't have a reliable way to tell
which items you may be about to modify, nor, in general, even which
items you have actually modified by the time you
close the shelve object. The
recommended approach, unless you're going to modify
just about every item you read (or unless the
shelve object in question is small enough compared
with your available memory that you don't really
care), is the previous one: bind a name to the items that you get
from a shelve object with intent to modify them,
and assign each item back into the shelve object
once you're done mutating that item.
See Also
Recipe 7.1 and Recipe 7.2 for alternative
serialization approaches; documentation for the
shelve standard library module in the
Library Reference and Python in a
Nutshell.