Recipe 7.8. Using the Berkeley DB Database
Credit: Farhad Fouladi
Problem
You want to persist some
data, exploiting the simplicity and good performance of the Berkeley
DB database library.
Solution
If you have previously installed Berkeley DB on your machine, the
Python Standard Library comes with package bsddb
(and optionally bsddb3, to access Berkeley DB
release 3.2 databases) to interface your Python code with Berkeley
DB. To get either bsddb or, lacking it,
bsddb3, use a
try/except on
import:
try:To create a database, instantiate a db.DB object,
from bsddb import db # first try release 4
except ImportError:
from bsddb3 import db # not there, try release 3 instead
print db.DB_VERSION_STRING
# emits, e.g: Sleepycat Software: Berkeley DB 4.1.25: (December 19, 2002)
then call its method open with appropriate
parameters, such as:
adb = db.DB( )db.DB_HASH is just one of several access methods
adb.open('db_filename', dbtype=db.DB_HASH, flags=db.DB_CREATE)
you may choose when you create a databasea popular alternative
is db.DB_BTREE, to use B+tree access (handy if you
need to get records in sorted order). You may make an in-memory
database, without an underlying file for persistence, by passing
None instead of a filename as the first argument
to the open method.Once you have an open instance of db.DB, you can
add records, each composed of two strings, key and
data:
for i, w in enumerate('some words for example'.split( )):You can access records via a cursor on the database:
adb.put(w, str(i))
def irecords(curs):When you're done, you close the database:
record = curs.first( )
while record:
yield record
record = curs.next( )
for key, data in irecords(adb.cursor( )):
print 'key=%r, data=%r' % (key, data)
# emits (the order may vary):
# key='some', data='0'
# key='example', data='3'
# key='words', data='1'
# key='for', data='2'
adb.close( )At any future time, in the same or another Python program, you can
reopen the database by giving just its filename as the argument to
the open method of a newly created
db.DB instance:
the_same_db = db.DB( )and work on it again in the same ways:
the_same_db.open('db_filename')
the_same_db.put('skidoo', '23') # add a recordAgain, remember to close the database when you're
the_same_db.put('words', 'sweet') # replace a record
for key, data in irecords(the_same_db.cursor( )):
print 'key=%r, data=%r' % (key, data)
# emits (the order may vary):
# key='some', data='0'
# key='example', data='3'
# key='words', data='sweet'
# key='for', data='2'
# key='skidoo', data='23'
done:
the_same_db.close( )
Discussion
The Berkeley DB is a popular open source database. It does not
support SQL, but it's simple to use, offers
excellent performance, and gives you a lot of control over exactly
what happens, if you care to exert it, through a huge array of
options, flags, and methods. Berkeley DB is just as accessible from
many other languages as from Python: for example, you can perform
some changes or queries with a Python program, and others with a
separate C program, on the same database file, using the same
underlying open source library that you can freely download from
Sleepycat.The Python Standard Library shelve module can use
the Berkeley DB as its underlying database engine, just as it uses
cPickle for serialization. However,
shelve does not let you take advantage of the
ability to access a Berkeley DB database file from several different
languages, exactly because the records are strings produced by
pickle.dumps, and languages other than Python
can't easily deal with them. Accessing the Berkeley
DB directly with bsddb also gives you access to
many advanced functionalities of the database engine that
shelve simply doesn't support.
db.DB_HASH, as shown in the recipe, may give
maximum performance, but, as you'll have noticed
when listing all records with the generator irecords
that is also presented in the recipe, hashing puts records in
apparently random, unpredictable order. If you need to access records
in sorted order, you can use an access method of
db.DB_BTREE instead. Berkeley DB also supports
more advanced functionality, such as transactions, which you can
enable through direct access but not via anydbm or
shelve.For detailed documentation about all functionality of the Python
Standard Library bsddb package, see http://pybsddb.sourceforge.net/bsddb3l.
For documentation, downloads, and more of the Berkeley DB itself, see
http://www.sleepycat.com/.
See Also
Library Reference and Python in a
Nutshell docs for modules anydbm,
shelve, and bsddb; http://pybsddb.sourceforge.net/bsddb3l
for many more details about bsddb and
bsddb3; http://www.sleepycat.com/ for downloads of,
and very detailed documentation on, the Berkeley DB itself.