Recipe 19.13. Fetching Large Record Sets from a Database with a Generator
Credit: Christopher Prinos
Problem
You want to fetch a result set
from a database (using the Python DB API) and easily iterate over
each record in the result set. However, you don't
want to use the DB cursor's method
fetchall: it could consume a lot of memory and
would force you to wait until the whole result set comes back before
you can start iterating.
Solution
A generator is the ideal solution to this problem:
def fetchsome(cursor, arraysize=1000):
''' A generator that simplifies the use of fetchmany '''
while True:
results = cursor.fetchmany(arraysize)
if not results: break
for result in results:
yield result
Discussion
In applications that use the Python DB API, you often see code that
goes somewhat like (where cursor is a DB API cursor
object):
cursor.execute('select * from HUGE_TABLE')This simple approach is "just"
for result in cursor.fetchall( ):
doSomethingWith(result)
fine, as long as fetchall returns a small result
set, but it does not work very well if the query result is very
large. A large result set can take a long time to return. Also,
cursor.fetchall( ) needs to allocate enough memory
to store the entire result set in memory at once. Further, with this
simple approach, the doSomethingWith function
isn't going to get called until the entire
query's result finishes coming over from the
database to our program.An alternative approach is to rely on the
cursor.fetchone method:
for result in iter(cursor.fetchone, None):However, this alternative approach does not allow the database to
doSomethingWith(result)
optimize the fetching process: most databases can exhibit better
efficiency when returning multiple records for a single query, rather
than returning records one at a time as fetchone
requires.To let your applications obtain greater efficiency than
fetchone allows, without the risks of unbounded
memory consumption and delay connected to the use of
fetchall, Python's DB
API's cursors also have a
fetchmany method. However, the direct use of
fetchmany makes your iterations somewhat more
complicated than the simple for statements such as
those just shown. For example:
while True:Python's generators are a great way to encapsulate
results = cursor.fetchmany(1000)
if not results: break
for result in results:
doSomethingWith(result)
complicated iteration logic so that application code can just about
always loop with simple for statements. With this
recipe's fetchsome generator, you
get the same efficiencies and safeguards as with the native use of
the fetchmany method in the preceding snippet but
with the same crystal-clear simplicity as in the snippets that used
either fetchall or fetchone,
namely:
for result in fetchsome(cursor):By default, fetchsome fetches up to 1,000 records at
doSomethingWith(result)
a time, but you can change that number, depending on your
requirements. Optimal values can depend on schema, database type,
choice of Python DB API module. In general, you're
best advised to experiment with a few different values in your
specific settings if you need to optimize this specific aspect. (Such
experimentation is often a good idea for any optimization task.)This recipe is clearly an example of a more general case: a
subsequence unbuncher generator that you can use
when you have a sequence of subsequences (each subsequence being
obtained through some call, and the end of the whole sequence being
indicated by an empty subsequence) and want to flatten it into a
simple, linear sequence of items. You can think of this unbunching
task as the reverse of the sequence-bunching tasks covered earlier in
Recipe 19.10 and Recipe 19.11, or as a simpler
variant of the sequence-flattening task covered in Recipe 4.6. A generator for unbunching
might be:
def unbunch(next_subseq, *args):As you can see, the structure of unbunch is
''' un-bunch a sequence of subsequences into a linear sequence '''
while True:
subseq = next_subseq(*args)
if not subseq: break
for item in subseq:
yield item
basically identical to that of the recipe's
fetchsome. Usage would also be just about the same:
for result in unbunch(cursor.fetchmany, 1000):However, while it is important and instructive to consider this kind
doSomethingWith(result)
of generalization, when you're writing applications
you're often better off using specific generators
that directly deal with your application's specific
needs. In this case, for example, calling
fetchsome(cursor) is more obvious and direct than
calling unbunch(cursor.fetchmany, 1000), and
fetchsome usefully hides the usage of
fetchmany as well as the specific choice of
1,000 as the subsequence size to fetch at each
step.
See Also
Recipe 19.10; Recipe 19.11; Recipe 4.6; Python's
DB API is covered in Chapter 7 and in
Python in a Nutshell.