Recipe 17.6. Translating a Python Sequence into a C Array with the PySequence_Fast Protocol
Credit: Luther Blissett
Problem
You have an existing C function
that takes as an argument a C array of C-level values (e.g.,
doubles), and you want to wrap it into a
Python-callable C extension that takes as an argument a Python
sequence or iterator.
Solution
The easiest way to accept an arbitrary Python sequence (or any other
iterable object) in the Python C API is with the
PySequence_Fast function. It builds and returns a
tuple when needed but returns only its argument (with the reference
count incremented) when the argument is already a list or tuple:
#include <Python.h>
/* a preexisting C-level function you want to expose, e.g: */
static double total(double* data, int len)
{
double total = 0.0;
int i;
for(i=0; i<len; ++i)
total += data[i];
return total;
}
/* here is how you expose it to Python code: */
static PyObject *totalDoubles(PyObject *self, PyObject *args)
{
PyObject* seq;
double *dbar;
double result;
int seqlen;
int i;
/* get one argument as a sequence */
if(!PyArg_ParseTuple(args, "O", &seq))
return 0;
seq = PySequence_Fast(seq, "argument must be iterable");
if(!seq)
return 0;
/* prepare data as an array of doubles */
seqlen = PySequence_Fast_GET_SIZE(seq);
dbar = malloc(seqlen*sizeof(double));
if(!dbar) {
Py_DECREF(seq);
return PyErr_NoMemory( );
}
for(i=0; i < seqlen; i++) {
PyObject *fitem;
PyObject *item = PySequence_Fast_GET_ITEM(seq, i);
if(!item) {
Py_DECREF(seq);
free(dbar);
return 0;
}
fitem = PyNumber_Float(item);
if(!fitem) {
Py_DECREF(seq);
free(dbar);
PyErr_SetString(PyExc_TypeError, "all items must be numbers");
return 0;
}
dbar[i] = PyFloat_AS_DOUBLE(fitem);
Py_DECREF(fitem);
}
/* clean up, compute, and return result */
Py_DECREF(seq);
result = total(dbar, seqlen);
free(dbar);
return Py_BuildValue("d", result);
}
static PyMethodDef totalMethods[ ] = {
{"total", totalDoubles, METH_VARARGS, "Sum a sequence of numbers."},
{0} /* sentinel */
};
void
inittotal(void)
{
(void) Py_InitModule("total", totalMethods);
}
Discussion
The two best ways for your C-coded, Python-callable extension
functions to accept generic Python sequences as arguments are
PySequence_Fast and
PyObject_GetIter. The latter, which I cover in the
next recipe, can often save some memory, but it is appropriate only
when it's OK for the rest of your C code to get the
items one at a time, without knowing beforehand how many items there
will be in total. You often have preexisting C functions from an
existing library that you want to expose to Python code, and such
functions may require C arrays as their input arguments. Thus, this
recipe shows how to build a C array (in this case, an array of
double) from a generic Python sequence (or other
iterable) argument, so that you can pass the array (and the integer
that gives the array's length) to your existing C
function (represented here, purely as an example, by the
total function at the start of the recipe). (In the
real world, you would use Python's built-in function
sum for this specific functionality, rather than
exposing any existing C function (but this is
meant to be just an example!)PySequence_Fast takes two arguments: a Python
iterable object to be presented as a sequence, and a string to use as
the error message in case the Python object cannot be presented as a
sequence, in which case PySequence_Fast returns
0 (the C null pointer, NULL, an
error indicator). If the Python object is already a list or tuple,
PySequence_Fast returns the same object with the
reference count increased by one. If the Python object is any other
kind of sequence (or any iterator, or other iterable),
PySequence_Fast builds and returns a new tuple
with all items already in place. In any case,
PySequence_Fast returns an object on which you can
call PySequence_Fast_GET_SIZE to obtain the
sequence length (as we do in the recipe, in order to
malloc the appropriate amount of storage for the C
array) and PySequence_Fast_GET_ITEM to get an item
given a valid index (an int between 0, included,
and the sequence length, excluded).The recipe requires quite a bit of care (as is typical of all C-coded
Python extensions, and more generally of any C code) to deal properly
with memory issues and error conditions. For C-coded Python
extensions, in particular, it's imperative that you
know which Python C API functions return new
references (which you must Py_DECREF when you are
done with them) and which ones return borrowed
references (which you must not Py_DECREF when
you're done with them; on the contrary, you must
Py_INCREF such a reference if you want to keep a
copy for a longer time). In this specific case, you have to know the
following (by reading the Python documentation):
- PyArg_ParseTuple produces borrowed references.
- PySequence_Fast returns a new reference.
- PySequence_Fast_GET_ITEM returns a borrowed
reference. - PyNumber_Float returns a new reference.
There is method to this madness, even though, as you start your
career as a coder of C API Python extensions, you'll
no doubt have to double-check each case carefully.
Python's C API strives to return borrowed references
(for the sake of the modest performance increase that they afford, by
avoiding needless incrementing and decrementing of reference counts),
when it knows it can always do so safely (i.e.,
it knows that the reference it is returning necessarily refers to an
already existing object). However, Python's C API
has to return a new reference when it's possible (or
certain) that a new object may have to be created.For example, in the preceding list, PyNumber_Float
and PySequence_Fast may be able to return the same
object they were given as an argument, but it's also
quite possible that they may have to create a new object for this
purpose, to ensure that the returned object has the correct type.
Therefore, these two functions are specified as always returning new
references. PyArg_ParseTuple and
PySequence_Fast_GET_ITEM, on the other hand,
always return references to objects that already exist elsewhere (as
items in the arguments' tuple, or as items in the
fast-sequence container, respectively). Therefore, these two
functions can afford to return borrowed references and are thus
specified as doing so.One last note: in this recipe, as soon as we obtain an item from the
fast-sequence container, we immediately try to transform it into a
Python float object, and thus we have to deal with
the possibility that the transformation will fail (e.g., if
we're passed a sequence containing a string, a
complex number, etc.). It is most often futile to first attempt a
check (with PyNumber_Check) because the check
might succeed, and the later transformation attempt might fail anyway
(e.g., with a complex-number item). Therefore, it's
better to attempt the transformation and deal with the resulting
error, if any. This approach is yet another case of the common
situation in which it's easier to get forgiveness
than permission!As usual, the best way to build this extension (assuming e.g., that
you've saved the extension's source
code as a file named total.c) is with the
distutils package. Place a file named
setup.py in the same directory as the C source:
from distutils.core import setup, Extensionthen build and install by running:
setup(name="total", maintainer="Luther Blissett", maintainer_email=
"situ@tioni.st", ext_modules=[Extension('total', sources=['total.c'])]
)
$ python setup.py installAn appealing aspect of this approach is that it works on any
platform, assuming that you have access to the same C compiler used
to build your version of Python, and permission to write on the
site-packages directory where the resulting
dynamically loaded library gets installed.
See Also
The Extending and Embedding manual is
available as part of the standard Python documentation set at
http://www.python.org/doc/current/ext/extl;
documentation on the Python C API is at http://www.python.org/doc/current/api/apil;
the section "Distributing Python
Modules" in the standard Python documentation set is
still incomplete, but it's a good source of
information on the distutils package;
Python in a Nutshell covers the essentials of
extending and embedding, of the Python C API, and of the
distutils package.