Poor(er) performance of Cython with NumPy array memoryview compared to C arrays

Question

I encountered a pretty weird result from a benchmark

Those are all different flavors of a bubblesort implementation, and the fastest approach at n=10^4 is converting a Python lists to C arrays internally. In contrast, the yellow line corresponds to code where I am using NumPy arrays with memoryview. I am expected the results to be vice versa. I (and colleagues) repeated the benchmark a couple of times and always got the same results. Maybe someone has an idea of what is going on here ...

enter image description here

The black line in the plot would correspond to the code:

%%cython
cimport cython
from libc.stdlib cimport malloc, free

def cython_bubblesort_clist(a_list):
    """ 
    The Cython implementation of bubble sort with internal
    conversion between Python list objects and C arrays.

    """
    cdef int *c_list
    c_list = <int *>malloc(len(a_list)*cython.sizeof(int))
    cdef int count, i, j # static type declarations
    count = len(a_list)

    # convert Python list to C array
    for i in range(count):
        c_list[i] = a_list[i]

    for i in range(count):
        for j in range(1, count):
            if c_list[j] < c_list[j-1]:
                c_list[j-1], c_list[j] = c_list[j], c_list[j-1]

    # convert C array back to Python list
    for i in range(count):
        a_list[i] = c_list[i]

    free(c_list)
    return a_list

and the pink line to this code:

%%cython
import numpy as np
cimport numpy as np
cimport cython
def cython_bubblesort_numpy(long[:] np_ary):
    """ 
    The Cython implementation of bubble sort with NumPy memoryview.

    """
    cdef int count, i, j # static type declarations
    count = np_ary.shape[0]

    for i in range(count):
        for j in range(1, count):
            if np_ary[j] < np_ary[j-1]:
                np_ary[j-1], np_ary[j] = np_ary[j], np_ary[j-1]

    return np.asarray(np_ary)

In your benchmark, you make the deepcopy part of the timing recorded. Although it's the same for each test why not move it outside of the timing call? Why do you expect numpy to be faster in this case? It's strength is not in accessing each element but in doing whole array operations. — Midnighter
– Midnighter, Commented May 14, 2014 at 18:02
Did you take a look at the following guide, here and onward? — Midnighter
– Midnighter, Commented May 14, 2014 at 18:07
add these decorators and you get the same performance. @cython.boundscheck(False) @cython.wraparound(False) — M4rtini
– M4rtini, Commented May 14, 2014 at 18:14

user2489252 · Accepted Answer · 2014-05-14 18:58:10Z

3

As suggested in the comments above, I added the decorators

%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) 
@cython.wraparound(False)
cpdef cython_bubblesort_numpy(long[:] np_ary):
    """ 
    The Cython implementation of bubble sort with NumPy memoryview.

    """
    cdef int count, i, j # static type declarations
    count = np_ary.shape[0]

    for i in range(count):
        for j in range(1, count):
            if np_ary[j] < np_ary[j-1]:
                np_ary[j-1], np_ary[j] = np_ary[j], np_ary[j-1]

    return np.asarray(np_ary)

and the results are more what I expected now :)

enter image description here

answered May 14, 2014 at 18:58

user2489252

Sign up to request clarification or add additional context in comments.

Comments

lothario · Accepted Answer · 2014-05-15 19:12:16Z

1

It is worth making one trivial change to your code to see if it improves things further:

cpdef cython_bubblesort_numpy(long[::1] np_ary):
    # ...

This tells cython that np_ary is a C contiguous array, and the generated code in the nested for loops can be further optimized with this information.

This code won't accept non-contiguous arrays as arguments, but that is fairly trivial to handle by using numpy.ascontiguousarray().

answered May 15, 2014 at 19:12

lothario

1,8681 gold badge15 silver badges8 bronze badges

5 Comments

user2489252 Over a year ago

Thanks! However, it won't compile, it doesn't like the if np_ary[j] < np_ary[j-1]:, am I missing something?

lothario Over a year ago

What's the compilation error message, and what version of Cython are you using? I'm able to get it to compile here.

user2489252 Over a year ago

Version 0.20.1, I posted the copied here: pastebin.com/BL1Vj9ic But I have the feeling that I am missing something here...

user2489252 Over a year ago

never mind, I don't know why, but I just saw that I had a long[::-1] instead of a long[::1] in my code ;)

user2489252 Over a year ago

The result from the benchmark suggests that it indeed made it faster, but not even by 1% at a sample size of 10^6

Collectives™ on Stack Overflow

Poor(er) performance of Cython with NumPy array memoryview compared to C arrays

2 Answers 2

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related