Recipe 7.11. Storing a BLOB in a PostgreSQL Database
Credit: Luther Blissett
Problem
You need to store a BLOB in a PostgreSQL database.
Solution
PostgreSQL 7.2 and later supports large objects, and the
psycopg module supplies a
Binary escaping function:
import psycopg, cPickle
# Connect to a DB, e.g., the test DB on your localhost, and get a cursor
connection = psycopg.connect("dbname=test")
cursor = connection.cursor( )
# Make a new table for experimentation
cursor.execute("CREATE TABLE justatest (name TEXT, ablob BYTEA)")
try:
# Prepare some BLOBs to insert in the table
names = 'aramis', 'athos', 'porthos'
data = { }
for name in names:
datum = list(name)
datum.sort( )
data[name] = cPickle.dumps(datum, 2)
# Perform the insertions
sql = "INSERT INTO justatest VALUES(%s, %s)"
for name in names:
cursor.execute(sql, (name, psycopg.Binary(data[name])) )
# Recover the data so you can check back
sql = "SELECT name, ablob FROM justatest ORDER BY name"
cursor.execute(sql)
for name, blob in cursor.fetchall( ):
print name, cPickle.loads(blob), cPickle.loads(data[name])
finally:
# Done. Remove the table and close the connection.
cursor.execute("DROP TABLE justatest")
connection.close( )
Discussion
PostgreSQL supports binary data (BYTEA and
variations thereof), but you should be careful when communicating
such data via SQL. Specifically, when you use a normal
INSERT SQL statement and need to have binary
strings among the VALUES you're
inserting, you have to escape some characters in the binary string
according to PostgreSQL's own rules. Fortunately,
you don't have to figure out those rules for
yourself: PostgreSQL supplies functions that do all the needed
escaping, and psycopg exposes such a function to
your Python programs as the Binary function. This
recipe shows a typical case: the BYTEAs
you're inserting come from
cPickle.dumps, so they may represent almost
arbitrary Python objects (although, in this case,
we're just using them for a few lists of
characters). The recipe is purely demonstrative and works by creating
a table and dropping it at the end (using a
TRy/finally statement to ensure finalization is
performed even if the program should terminate because of an uncaught
exception).Earlier PostgreSQL releases limited to a few kilobytes the amount of
data you could store in a normal field of the database. To store
really large objects, you had to use roundabout techniques to load
the data into the database (such as PostgreSQL's
nonstandard SQL function LO_IMPORT to load a data
file as an object, which requires superuser privileges and data files
that reside on the machine running the Postgre-SQL Server) and store
a field of type OID in the table to be used later
for indirect recovery of the data. Fortunately, none of these
techniques are necessary anymore: since Release 7.1 (the current
release at the time of writing is 8.0), PostgreSQL embodies the
results of project TOAST, which removed the limitations on
field-storage size and therefore the need for peculiar indirection.
Module psycopg supplies the handy
Binary function to let you escape any binary
string of bytes into a form acceptable for placeholder substitution
in INSERT and UPDATE SQL
statements.
See Also
Recipe 7.10 and Recipe 7.12 for MySQL-oriented and
SQLite-oriented solutions to the same problem;
PostgresSQL's home page (http://www.postgresql.org/); the
Python/PostgreSQL module (http://initd.org/software/psycopg).