Using Cython correctly in sample code with numpy

Question

I was wondering if I'm missing something when using Cython with Numpy because I haven't seen much of an improvement. I wrote this code as an example.

Naive version:

import numpy as np
from skimage.util import view_as_windows

it = 16
arr = np.arange(1000*1000, dtype=np.float64).reshape(1000,1000)
windows = view_as_windows(arr, (it, it), it)
container = np.zeros((windows.shape[0], windows.shape[1]))
def test(windows):
    for i in range(windows.shape[0]):
        for j in range(windows.shape[1]):
            container[i,j] = np.mean(windows[i,j])
    return container

%%timeit 

test(windows)
1 loops, best of 3: 131 ms per loop

Cythonized version:

%%cython --annotate

import numpy as np
cimport numpy as np
from skimage.util import view_as_windows
import cython
cdef int step = 16

arr = np.arange(1000*1000, dtype=np.float64).reshape(1000,1000)
windows = view_as_windows(arr, (step, step), step)

@cython.boundscheck(False)
def cython_test(np.ndarray[np.float64_t, ndim=4]  windows):
    cdef np.ndarray[np.float64_t, ndim=2] container = np.zeros((windows.shape[0], windows.shape[1]),dtype=np.float64)
    cdef int i, j
    I = windows.shape[0]
    J = windows.shape[1]
    for i in range(I):
        for j in range(J):
            container[i,j] = np.mean(windows[i,j])
    return container


%timeit cython_test(windows)
10 loops, best of 3: 126 ms per loop

As you can see, there is a very modest improvement, so maybe I'm doing something wrong. By the way, the annotation that Cython produces the following:

enter image description here

As you can see, the numpy lines have a yellow background even after including the efficient indexing syntax np.ndarray[DTYPE_t, ndim=2]. Why?

By the way, in my view the ideal outcome is being able to use most numpy functions but still get some reasonable improvement after taking advantage of efficient indexing syntax or maybe memory views as in HYRY's answer.

UPDATE

It seems I'm not doing anything wrong in the code I posted above and that the yellow background in some lines is normal, so I was left wondering the following: In which situations I can get a benefit from typing cdef np.ndarray[np.float64_t, ndim=2] in front of numpy arrays? I suppose there are specific instances where this is helpful, otherwise there wouldn't be much purpose in doing it.

Already did. There are some fluctuations but I still get around 121-126ms. — r_31415
– r_31415, Commented Jan 18, 2015 at 2:51
You probably have too much Python overhead. Cython use is illustrated at the end of this numpy iteration page: docs.scipy.org/doc/numpy/reference/arrays.nditer.html. For a start i'd try removing the np.mean call (do it direct). — hpaulj
– hpaulj, Commented Jan 18, 2015 at 4:14
That might be the reason. The problem is that for more complicated code with a lot of indexing and slicing, I still don't have a big improvement. That is why I thought I was making a mistake. — r_31415
– r_31415, Commented Jan 18, 2015 at 4:14
@hpaulj Thanks. That is a somewhat obscure example. I'm not sure I understand the general idea other than np.nditer can be useful to expose the inner loop to Cython. — r_31415
– r_31415, Commented Jan 18, 2015 at 4:26

HYRY · Accepted Answer · 2015-01-18 11:11:47Z

3

You need to implement the mean() function yourself to speedup the code, this is because the overhead of calling a numpy function is very high.

@cython.boundscheck(False)
@cython.wraparound(False)
def cython_test(double[:, :, :, :]  windows):
    cdef double[:, ::1] container
    cdef int i, j, k, l
    cdef int n0, n1, n2, n3
    cdef double inv_n
    cdef double s
    n0, n1, n2, n3 = windows.base.shape
    container = np.zeros((n0, n1))
    inv_n = 1.0 / (n2 * n3)
    for i in range(n0):
        for j in range(n1):
            s = 0
            for k in range(n2):
                for l in range(n3):
                    s += windows[i, j, k, l]
            container[i,j] = s * inv_n
    return container.base

Here is the %timeit results:

python_test(windows): 63.7 ms
cython_test(windows): 1.24 ms
np.mean(windows, axis=(2, 3)): 2.66 ms

edited Jan 18, 2015 at 11:11

answered Jan 18, 2015 at 10:42

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Padraic Cunningham Over a year ago

does this actually return the same output?

HYRY Over a year ago

I checked the results by np.allclose(), it's the same.

Padraic Cunningham Over a year ago

what is a MemoryView of 'ndarray' object?

HYRY Over a year ago

I modified the code, return container.base which returns the numpy array create by np.zeros().

Padraic Cunningham Over a year ago

cool, but what can you do with a MemoryView of 'ndarray' object?

|

Collectives™ on Stack Overflow

Using Cython correctly in sample code with numpy

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related