6

I have a variable number of numpy arrays, which I'd like to pass to a C function. I managed to pass each individual array (using <ndarray>.ctypes.data_as(c_void_p)), but the number of array may vary a lot.

I thought I could pass all of these "pointers" in a list and use the PyList_GetItem() function in the C code. It works like a charm, except that the values of all elements are not the pointers I usually get when they are passed as function arguments.

Though, if I have :

from numpy import array
from ctypes import py_object

a1 = array([1., 2., 3.8])
a2 = array([222.3, 33.5])

values = [a1, a2]

my_cfunc(py_object(values), c_long(len(values)))

And my C code looks like :

void my_cfunc(PyObject *values)
{
    int i, n;

    n = PyObject_Length(values)
    for(i = 0; i < n; i++)
    {
        unsigned long long *pointer;
        pointer = (unsigned long long *)(PyList_GetItem(values, i);
        printf("value 0 : %f\n", *pointer);
    }
}

The printed value are all 0.0000

I have tried a lot of different solutions, using ctypes.byref(), ctypes.pointer(), etc. But I can't seem to be able to retrieve the real pointer values. I even have the impression the values converted by c_void_p() are truncated to 32 bits...

While there are many documentations about passing numpy pointers to C, I haven't seen anything about c_types within Python list (I admit this may seem strange...).

Any clue ?

4
  • have you already looked at Cython? Commented Oct 15, 2014 at 13:48
  • 2
    This is probably because PyList_GetItem returns you a PyObject* which is the ndarray itself, to get underlying data you need to apply PyArray_DATA from numpy.h. Commented Oct 15, 2014 at 17:39
  • Unfortunately, I can't use Cython since this is one of a few hundreds modules, imported in Python web handler (handler.py/uwsgi). But I'll keep an eye on Cython :-) Commented Oct 17, 2014 at 13:31
  • @immerr: Thank you... Your comment put me back on the right track, as I was getting dragged away by wrestling inefficiently in 'ctypes'... Commented Oct 17, 2014 at 13:35

1 Answer 1

6

After a few hours spent reading many pages of documentation and digging in numpy include files, I've finally managed to understand exactly how it works. Since I've spent a great amount of time searching for these exact explanations, I'm providing the following text as a way to avoid anyone to waste its time.

I repeat the question :

How to transfer a list of numpy arrays, from Python to C

(I also assume you know how to compile, link and import your C module in Python)

Passing a Numpy array from Python to C is rather simple, as long as it's going to be passed as an argument in a C function. You just need to do something like this in Python

from numpy import array
from ctypes import c_long

values = array([1.0, 2.2, 3.3, 4.4, 5.5])

my_c_func(values.ctypes.data_as(c_void_p), c_long(values.size))

And the C code could look like :

void my_c_func(double *value, long size)
{
    int i;
    for (i = 0; i < size; i++)
        printf("%ld : %.10f\n", i, values[i]);
}

That's simple... but what if I have a variable number of arrays ? Of course, I could use the techniques which parses the function's argument list (many examples in Stackoverflow), but I'd like to do something different.

I'd like to store all my arrays in a list and pass this list to the C function, and let the C code handle all the arrays.

In fact, it's extremely simple, easy et coherent... once you understand how it's done ! There is simply one very simple fact to remember :

Any member of a list/tuple/dictionary is a Python object... on the C side of the code !

You can't expect to directly pass a pointer as I initially, and wrongly, thought. Once said, it sounds very simple :-) Though, let's write some Python code :

from numpy import array

my_list = (array([1.0, 2.2, 3.3, 4.4, 5.5]),
           array([2.9, 3.8. 4.7, 5.6]))

my_c_func(py_object(my_list))

Well, you don't need to change anything in the list, but you need to specify that you are passing the list as a PyObject argument.

And here is the how all this is being accessed in C.

void my_c_func(PyObject *list)
{
    int i, n_arrays;

    // Get the number of elements in the list
    n_arrays = PyObject_Length(list);

    for (i = 0; i LT n_arrays; i++)
    {
        PyArrayObject *elem;
        double *pd;

        elem = PyList_GetItem(list,
                              i);
        pd = PyArray_DATA(elem);
        printf("Value 0 : %.10f\n", *pd);
    }
}

Explanation :

  • The list is received as a pointer to a PyObject
  • We get the number of array from the list by using the PyObject_Length() function.
  • PyList_GetItem() always return a PyObject (in fact a void *)
  • We retrieve the pointer to the array of data by using the PyArray_DATA() macro.

Normally, PyList_GetItem() returns a PyObject *, but, if you look in the Python.h and ndarraytypes.h, you'll find that they are both defined as (I've expanded the macros !):

typedef struct _object {
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

And the PyArrayObject... is exactly the same. Though, it's perfectly interchangeable at this level. The content of ob_type is accessible for both objects and contain everything which is needed to manipulate any generic Python object. I admit that I've used one of its member during my investigations. The struct member tp_name is the string containing the name of the object... in clear text; and believe me, it helped ! This is how I discovered what each list element was containing.

While these structures don't contain anything else, how is it that we can access the pointer of this ndarray object ? Simply using object macros... which use an extended structure, allowing the compiler to know how to access the additional object's elements, behind the ob_type pointer. The PyArray_DATA() macro is defined as :

#define PyArray_DATA(obj) ((void *)((PyArrayObject_fields *)(obj))->data)

There, it's casting the PyArayObject * as a PyArrayObject_fields * and this latest structure is simply (simplified and macros expanded !) :

typedef struct tagPyArrayObject_fields {
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    char *data;
    int nd;
    npy_intp *dimensions;
    npy_intp *strides;
    PyObject *base;
    PyArray_Descr *descr;
    int flags;
    PyObject *weakreflist;
} PyArrayObject_fields;

As you can see, the first two element of the structure are the same as a PyObject and PyArrayObject, but additional elements can be addressed using this definition. It is tempting to directly access these elements, but it's a very bad and dangerous practice which is more than strongly discouraged. You must rather use the macros and don't bother with the details and elements in all these structures. I just thought you might be interested by some internals.

Note that all PyArrayObject macros are documented in http://docs.scipy.org/doc/numpy/reference/c-api.array.html

For instance, the size of a PyArrayObject can be obtained using the macro PyArray_SIZE(PyArrayObject *)

Finally, it's very simple and logical, once you know it :-)

Sign up to request clarification or add additional context in comments.

4 Comments

In my case I was iterating a list of numpy arrays, and without the .ctypes.data_as(c_void_p) it would send the same memory address to C for each element. Thanks for the solution!
Hi dcexcal, I am getting the following error. *** stack smashing detected ***: terminated
Wow, never go this one :-) Are you sure about the values of your pointers in C ? It might be that you are stepping out of your arrays... The compiler uses a 'canary' mechanism to detect wrong access behavior. This could be disabled with the option '-fno-stack-protector'. However, the problem will not go away with this option ! Are you playing a lot with your pointers and accessing locations with your own computation ? This might be the cause. There is a good, and extensive, explanation on : stackoverflow.com/questions/1345670/stack-smashing-detected
I have to admit that I haven't played with C/Python interaction since a long time (2014 in fact). Python has changed a lot since 2.7 and we are now in 3.11; which is likely to have different bindings. I should probably look at nanobind.readthedocs.io/en/latest/why.html for the future. I will certainly work on C/Python binding in the near future (compaction of multiple list of numbers), and I'll update this post if needed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.