Creating a list of numpy.ndarray of unequal length in Cython

Question

I now have python code to create a list of ndarrays, and these arrays are not equal length. The piece of code snippet that looks like this:

import numpy as np
from mymodule import list_size, array_length # list_size and array_length are two lists of ints, and the len(array_length) == list_size

ndarray_list = []

for i in range(list_size):
    ndarray_list.append(np.zeros(array_length[i]))

Now, I need to convert this to Cython, but do not know how. I tried to create a 2-d dynamically allocated array, like this:

import numpy as np
cimport numpy as np
from mymodule import list_size, array_length

cdef int i
ndarray_list = <double **>malloc(list_size * sizeof(double*))
for i in range(list_size):
    ndarray_list[i] = <double *>malloc(array_length[i] * sizeof(double))

However, this method only creates a double pointer in ndarray_list[i]. I cannot pass it to other functions which requires some of the ndarray method.

What should I do?

I tried to condense the two approaches in one answer, but it looks much better split in two... your approach with malloc() is orders of magnitudes faster, so you should consider the malloc()-based answer... — Saullo G. P. Castro
– Saullo G. P. Castro, Commented May 24, 2014 at 5:32

Saullo G. P. Castro · Accepted Answer · 2014-05-24 19:40:35Z

4

In order to pass the C double* buffer to a function that requires a numpy.ndarray you can create a temporary buffer and assign to its memory address the address of the double* array.

This malloc()-based solution is orders of magnitude faster than the other answer based on NumPy buffers. Note how to free() the inner arrays to avoid a memory leak.

import numpy as np
cimport numpy as np
from cython cimport view
from libc.stdlib cimport malloc, free

cdef int i
cdef double test
list_size = 10
ndarray_list = <double **>malloc(list_size * sizeof(double*))
array_length = <int *>malloc(list_size * sizeof(int*))
for i in range(list_size):
    array_length[i] = i+1
    ndarray_list[i] = <double *>malloc(array_length[i] * sizeof(double))
    for j in range(array_length[i]):
        ndarray_list[i][j] = j

for i in range(list_size):
    for j in range(array_length[i]):
        test = ndarray_list[i][j]

cdef view.array buff
for i in range(list_size):
    buff = <double[:array_length[i]]>ndarray_list[i]
    print np.sum(buff)

#...

for i in range(list_size):
    free(ndarray_list[i])
free(ndarray_list)
free(array_length)

edited May 24, 2014 at 19:40

answered May 24, 2014 at 5:27

Saullo G. P. Castro

59.4k28 gold badges191 silver badges244 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Yuxiang Wang Over a year ago

Thank you so much Saullo Castro for your excellent answer! Just one quick question in understanding the code - why is buff.data casted to <char *>? Shouldn't it be a <double *>?

Saullo G. P. Castro Over a year ago

@ShawnWang That was my first attempt but I got Cannot assign type 'double *' to 'char *', then I used char *, I did not find any reference explaining why we must use char *

Yuxiang Wang Over a year ago

Thanks Saullo! This is really interesting... I tried void * and it wouldn't work either. Well, at least we got it working. Thanks again! :)

Saullo G. P. Castro Over a year ago

@ShawnWang I am trying to find a better way to do this... without having to call np.empty() and use this char * cast...

Saullo G. P. Castro Over a year ago

@ShawnWang I've found a very straightforward way to do it in a much cleaner way using Cython arrays... check the update...

|

Saullo G. P. Castro · Accepted Answer · 2014-05-24 05:23:34Z

You can use the object type with a NumPy-based buffer. To populate ndarray_list efficiently you only need an object buffer, but note that many calls to np.zeros() may cause some slowness:

cdef int i, list_size
cdef np.ndarray[np.int_t, ndim=1] array_length
cdef np.ndarray[object, ndim=1] ndarray_list

list_size = 10000
array_length = np.arange(list_size).astype(np.int)+1

ndarray_list = np.empty(list_size, dtype=object)
for i in range(list_size):
    ndarray_list[i] = np.zeros(array_length[i], dtype=np.float64)

To access the inner arrays efficiently, you need another 1-D buffer:

cdef np.ndarray[np.float64_t, ndim=1] inner_array
cdef double test
cdef int j

for i in range(list_size):
    inner_array = ndarray_list[i]
    for j in range(inner_array.shape[0]):
        test = inner_array[j]

Collectives™ on Stack Overflow

Creating a list of numpy.ndarray of unequal length in Cython

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related