Recipe 7.4. Using the cPickle Module on Classes and Instances
Credit: Luther Blissett
Problem
You want to save and restore
class and instance objects using the cPickle
module.
Solution
You often need no special precautions to use
cPickle on your classes and their instances. For
example, the following works fine:
import cPickleHowever, sometimes there are problems:
class ForExample(object):
def _ _init_ _(self, *stuff):
self.stuff = stuff
anInstance = ForExample(''one'', 2, 3)
saved = cPickle.dumps(anInstance)
reloaded = cPickle.loads(saved)
assert anInstance.stuff == reloaded.stuff
anotherInstance = ForExample(1, 2, open(''three'', ''w''))This snippet causes a TypeError:
wontWork = cPickle.dumps(anotherInstance)
"can''t pickle file
objects" exception, because the state of
anotherInstance includes a file
object, and file objects cannot be pickled. You
would get exactly the same exception if you tried to pickle any other
container that includes a file object among its
items.However, in some cases, you may be able to do something about it:
class PrettyClever(object):By defining the _ _getstate_ _ and _
def _ _init_ _(self, *stuff):
self.stuff = stuff
def _ _getstate_ _(self):
def normalize(x):
if isinstance(x, file):
return 1, (x.name, x.mode, x.tell( ))
return 0, x
return [ normalize(x) for x in self.stuff ]
def _ _setstate_ _(self, stuff):
def reconstruct(x):
if x[0] == 0:
return x[1]
name, mode, offs = x[1]
openfile = open(name, mode)
openfile.seek(offs)
return openfile
self.stuff = tuple([reconstruct(x) for x in stuff])
_setstate_ _ special methods in your class, you gain
fine-grained control about what, exactly, your
class'' instances consider to be their state. As long
as you can define such state in picklable terms, and reconstruct your
instances from the unpickled state in some way that is sufficient for
your application, you can make your instances themselves picklable
and unpicklable in this way.
Discussion
cPickle dumps class and function objects by name
(i.e., through their module''s name and their name
within the module). Thus, you can dump only classes defined at module
level (not inside other classes and functions). Reloading such
objects requires the respective modules to be available for
import. Instances can be saved and reloaded only
if they belong to such classes. In addition, the
instance''s state must also be picklable.By default, an instance''s state is the contents of
the instance''s _ _dict_ _, plus
whatever state the instance may get from the built-in type the
instance''s class inherits from, if any. For example,
an instance of a new-style class that subclasses
list includes the list items as part of the
instance''s state. cPickle also
handles instances of new-style classes that define or inherit a class
attribute named _ _slots_ _
(and therefore hold some or all per-instance state in those
predefined slots, rather than in a per-instance _ _dict_
_). Overall,
cPickle''s default approach is
often quite sufficient and satisfactory.Sometimes, however, you may have nonpicklable attributes or items as
part of your instance''s state (as
cPickle defines such state by default, as
explained in the previous paragraph). In this recipe, for example, I
show a class whose instances hold arbitrary stuff, which may include
open file objects. To handle this case, your class
can define the special method _ _getstate_ _.
cPickle calls that method on your object, if your
object''s class defines it or inherits it, instead of
going directly for the object''s _ _dict_
_ (or possibly _ _slots_ _ and/or
built-in type bases).Normally, when you define the _ _getstate_ _
method, you define the _ _setstate_ _ method as
well, as shown in this recipe''s Solution. _
_getstate_ _ can return any picklable object, and that
object gets pickled, and later, at unpickling time, passed as
_ _setstate_ _''s argument. In
this recipe''s Solution, _ _getstate_
_ returns a list that''s similar to the
instance''s default state (attribute
self.stuff), except that each item is turned into
a tuple of two items. The first item in the pair can be set to
0 to indicate that the second one will be taken
verbatim, or 1 to indicate that the second item
will be used to reconstruct an open file. (Of course, the
reconstruction may fail or be unsatisfactory in several ways. There
is no general way to save an open file''s state,
which is why cPickle itself
doesn''t even try. But in the context of our
application, we can assume that the given approach will work.) When
reloading the instance from pickled form, cPickle
calls _ _setstate_ _ with the list of pairs, and
_ _setstate_ _ can reconstruct
self.stuff by processing each pair appropriately
in its nested reconstruct function. This scheme can
clearly generalize to getting and restoring state that may contain
various kinds of normally unpicklable objectsjust be sure to
use different numbers to tag each of the various kinds of
"nonverbatim" pairs you need to
support.In one particular case, you can define _ _getstate_
_ without defining _ _setstate_ _:
_ _getstate_ _ must then return a dictionary, and
reloading the instance from pickled form uses that dictionary just as
the instance''s _ _dict_ _ would
normally be used. Not running your own code at reloading time is a
serious hindrance, but it may come in handy when you want to use
_ _getstate_ _, not to save otherwise unpicklable
state but rather as an optimization. Typically, this optimization
opportunity occurs when your instance caches results that it can
recompute if they''re absent, and you decide
it''s best not to store the cache as a part of the
instance''s state. In this case, you should define
_ _getstate_ _ to return a dictionary
that''s the indispensable subset of the
instance''s _ _dict_ _. (See
Recipe 4.13) for a simple
and handy way to "subset a
dictionary".)Defining _ _getstate_ _ (and then, normally, also
_ _setstate_ _) also gives you a further important
bonus, besides the pickling support: if a class offers these methods
but doesn''t offer special methods _ _copy_
_ or _ _deepcopy_ _, then the methods
are also used for copying, both shallowly and deeply, as well as for
serializing. The state data returned by _ _getstate_
_ is deep-copied if and only if the object is being
dee-copied, but, other than this distinction, shallow and deep copies
work very similarly when they are implemented through _
_getstate_ _. See Recipe 4.1 for more information about
how a class can control the way its instances are copied, shallowly
or deeply. With either the default
pickling/unpickling approach, or your own _ _getstate_
_ and _ _setstate_ _, the
instance''s special method _ _init_
_ is not called when the instance is
getting unpickled. If the most convenient way for you to reconstruct
an instance is to call the _ _init_ _ method with
appropriate parameters, then you may want to define the special
method _ _getinitargs_ _, instead of _
_getstate_ _. In this case, cPickle
calls this method without arguments: the method must return a
pickable tuple, and at unpickling time,
cPickle calls _ _init_ _ with
the arguments that are that tuple''s items.
_ _getinitargs_ _, like _ _getstate_
_ and _ _setstate_ _, can also be used
for copying.The Library Reference for the
pickle and copy_reg modules
details even subtler things you can do when pickling and unpickling,
as well as the thorny security issues that are likely to arise if you
ever stoop to unpickling data from untrusted sources. (Executive
summary: don''t do
thatthere is no way Python can protect you if you
do.) However, the techniques I''ve discussed here
should suffice in almost all practical cases, as long as the security
aspects of unpickling are not a problem (and if they are, the
only practical suggestion is: forget pickling!).
See Also
Recipe 7.2; documentation
for the standard library module cPickle in the
Library Reference and Python in a
Nutshell.