1

I was wondering if I'm missing something when using Cython with Numpy because I haven't seen much of an improvement. I wrote this code as an example.

Naive version:

import numpy as np
from skimage.util import view_as_windows

it = 16
arr = np.arange(1000*1000, dtype=np.float64).reshape(1000,1000)
windows = view_as_windows(arr, (it, it), it)
container = np.zeros((windows.shape[0], windows.shape[1]))
def test(windows):
    for i in range(windows.shape[0]):
        for j in range(windows.shape[1]):
            container[i,j] = np.mean(windows[i,j])
    return container

%%timeit 

test(windows)
1 loops, best of 3: 131 ms per loop

Cythonized version:

%%cython --annotate

import numpy as np
cimport numpy as np
from skimage.util import view_as_windows
import cython
cdef int step = 16

arr = np.arange(1000*1000, dtype=np.float64).reshape(1000,1000)
windows = view_as_windows(arr, (step, step), step)

@cython.boundscheck(False)
def cython_test(np.ndarray[np.float64_t, ndim=4]  windows):
    cdef np.ndarray[np.float64_t, ndim=2] container = np.zeros((windows.shape[0], windows.shape[1]),dtype=np.float64)
    cdef int i, j
    I = windows.shape[0]
    J = windows.shape[1]
    for i in range(I):
        for j in range(J):
            container[i,j] = np.mean(windows[i,j])
    return container


%timeit cython_test(windows)
10 loops, best of 3: 126 ms per loop

As you can see, there is a very modest improvement, so maybe I'm doing something wrong. By the way, the annotation that Cython produces the following:

enter image description here

As you can see, the numpy lines have a yellow background even after including the efficient indexing syntax np.ndarray[DTYPE_t, ndim=2]. Why?

By the way, in my view the ideal outcome is being able to use most numpy functions but still get some reasonable improvement after taking advantage of efficient indexing syntax or maybe memory views as in HYRY's answer.

UPDATE

It seems I'm not doing anything wrong in the code I posted above and that the yellow background in some lines is normal, so I was left wondering the following: In which situations I can get a benefit from typing cdef np.ndarray[np.float64_t, ndim=2] in front of numpy arrays? I suppose there are specific instances where this is helpful, otherwise there wouldn't be much purpose in doing it.

9
  • I'm declaring each np.float64_t instead. Is not the same? Commented Jan 18, 2015 at 2:32
  • Already did. There are some fluctuations but I still get around 121-126ms. Commented Jan 18, 2015 at 2:51
  • You probably have too much Python overhead. Cython use is illustrated at the end of this numpy iteration page: docs.scipy.org/doc/numpy/reference/arrays.nditer.html. For a start i'd try removing the np.mean call (do it direct). Commented Jan 18, 2015 at 4:14
  • That might be the reason. The problem is that for more complicated code with a lot of indexing and slicing, I still don't have a big improvement. That is why I thought I was making a mistake. Commented Jan 18, 2015 at 4:14
  • @hpaulj Thanks. That is a somewhat obscure example. I'm not sure I understand the general idea other than np.nditer can be useful to expose the inner loop to Cython. Commented Jan 18, 2015 at 4:26

1 Answer 1

3

You need to implement the mean() function yourself to speedup the code, this is because the overhead of calling a numpy function is very high.

@cython.boundscheck(False)
@cython.wraparound(False)
def cython_test(double[:, :, :, :]  windows):
    cdef double[:, ::1] container
    cdef int i, j, k, l
    cdef int n0, n1, n2, n3
    cdef double inv_n
    cdef double s
    n0, n1, n2, n3 = windows.base.shape
    container = np.zeros((n0, n1))
    inv_n = 1.0 / (n2 * n3)
    for i in range(n0):
        for j in range(n1):
            s = 0
            for k in range(n2):
                for l in range(n3):
                    s += windows[i, j, k, l]
            container[i,j] = s * inv_n
    return container.base

Here is the %timeit results:

  • python_test(windows): 63.7 ms
  • cython_test(windows): 1.24 ms
  • np.mean(windows, axis=(2, 3)): 2.66 ms
Sign up to request clarification or add additional context in comments.

7 Comments

does this actually return the same output?
I checked the results by np.allclose(), it's the same.
what is a MemoryView of 'ndarray' object?
I modified the code, return container.base which returns the numpy array create by np.zeros().
cool, but what can you do with a MemoryView of 'ndarray' object?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.