3

I encountered a pretty weird result from a benchmark

Those are all different flavors of a bubblesort implementation, and the fastest approach at n=10^4 is converting a Python lists to C arrays internally. In contrast, the yellow line corresponds to code where I am using NumPy arrays with memoryview. I am expected the results to be vice versa. I (and colleagues) repeated the benchmark a couple of times and always got the same results. Maybe someone has an idea of what is going on here ...

enter image description here

The black line in the plot would correspond to the code:

%%cython
cimport cython
from libc.stdlib cimport malloc, free

def cython_bubblesort_clist(a_list):
    """ 
    The Cython implementation of bubble sort with internal
    conversion between Python list objects and C arrays.

    """
    cdef int *c_list
    c_list = <int *>malloc(len(a_list)*cython.sizeof(int))
    cdef int count, i, j # static type declarations
    count = len(a_list)

    # convert Python list to C array
    for i in range(count):
        c_list[i] = a_list[i]

    for i in range(count):
        for j in range(1, count):
            if c_list[j] < c_list[j-1]:
                c_list[j-1], c_list[j] = c_list[j], c_list[j-1]

    # convert C array back to Python list
    for i in range(count):
        a_list[i] = c_list[i]

    free(c_list)
    return a_list

and the pink line to this code:

%%cython
import numpy as np
cimport numpy as np
cimport cython
def cython_bubblesort_numpy(long[:] np_ary):
    """ 
    The Cython implementation of bubble sort with NumPy memoryview.

    """
    cdef int count, i, j # static type declarations
    count = np_ary.shape[0]

    for i in range(count):
        for j in range(1, count):
            if np_ary[j] < np_ary[j-1]:
                np_ary[j-1], np_ary[j] = np_ary[j], np_ary[j-1]

    return np.asarray(np_ary)
3
  • 1
    In your benchmark, you make the deepcopy part of the timing recorded. Although it's the same for each test why not move it outside of the timing call? Why do you expect numpy to be faster in this case? It's strength is not in accessing each element but in doing whole array operations. Commented May 14, 2014 at 18:02
  • 1
    Did you take a look at the following guide, here and onward? Commented May 14, 2014 at 18:07
  • 3
    add these decorators and you get the same performance. @cython.boundscheck(False) @cython.wraparound(False) Commented May 14, 2014 at 18:14

2 Answers 2

3

As suggested in the comments above, I added the decorators

%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) 
@cython.wraparound(False)
cpdef cython_bubblesort_numpy(long[:] np_ary):
    """ 
    The Cython implementation of bubble sort with NumPy memoryview.

    """
    cdef int count, i, j # static type declarations
    count = np_ary.shape[0]

    for i in range(count):
        for j in range(1, count):
            if np_ary[j] < np_ary[j-1]:
                np_ary[j-1], np_ary[j] = np_ary[j], np_ary[j-1]

    return np.asarray(np_ary)

and the results are more what I expected now :)

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

It is worth making one trivial change to your code to see if it improves things further:

cpdef cython_bubblesort_numpy(long[::1] np_ary):
    # ...

This tells cython that np_ary is a C contiguous array, and the generated code in the nested for loops can be further optimized with this information.

This code won't accept non-contiguous arrays as arguments, but that is fairly trivial to handle by using numpy.ascontiguousarray().

5 Comments

Thanks! However, it won't compile, it doesn't like the if np_ary[j] < np_ary[j-1]:, am I missing something?
What's the compilation error message, and what version of Cython are you using? I'm able to get it to compile here.
Version 0.20.1, I posted the copied here: pastebin.com/BL1Vj9ic But I have the feeling that I am missing something here...
never mind, I don't know why, but I just saw that I had a long[::-1] instead of a long[::1] in my code ;)
The result from the benchmark suggests that it indeed made it faster, but not even by 1% at a sample size of 10^6

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.