2

I have the following function in pure python:

import numpy as np

def subtractPython(a, b):
    xAxisCount = a.shape[0]
    yAxisCount = a.shape[1]

    shape = (xAxisCount, yAxisCount, xAxisCount)
    results = np.zeros(shape)
    for index in range(len(b)):
        subtracted = (a - b[index])
        results[:, :, index] = subtracted
    return results

I tried to cythonize it this way:

import numpy as np
cimport numpy as np

DTYPE = np.int
ctypedef np.int_t DTYPE_t

def subtractPython(np.ndarray[DTYPE_t, ndim=2] a, np.ndarray[DTYPE_t, ndim=2] b):
    cdef int xAxisCount = a.shape[0]
    cdef int yAxisCount = a.shape[1]

    cdef np.ndarray[DTYPE_t, ndim=3] results = np.zeros([xAxisCount, yAxisCount, xAxisCount], dtype=DTYPE)

    cdef int lenB = len(b)

    cdef np.ndarray[DTYPE_t, ndim=2] subtracted
    for index in range(lenB):
        subtracted = (a - b[index])
        results[:, :, index] = subtracted
    return results

However, Im not seeing any speedup. Is there something I'm missing or this process can't be sped up?

EDIT -> I've realized that I'm not actually cythonizing the subtraction algorithm in the above code. I've managed to cythonize it, but it has the exact same runtime as a - b[:, None], so I guess this is the maximum speed of this operation.

This is basically a - b[:, None] -> has same runtime

%%cython

import numpy as np
cimport numpy as np


DTYPE = np.int
ctypedef np.int_t DTYPE_t

cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def subtract(np.ndarray[DTYPE_t, ndim=2] a, np.ndarray[DTYPE_t, ndim=2] b):
    cdef np.ndarray[DTYPE_t, ndim=3] result = np.zeros([b.shape[0], a.shape[0], a.shape[1]], dtype=DTYPE)

    cdef int lenB = b.shape[0]
    cdef int lenA = a.shape[0]
    cdef int lenColB = b.shape[1]

    cdef int rowA, rowB, column

    for rowB in range(lenB):
        for rowA in range(lenA):
            for column in range(lenColB):
                result[rowB, rowA, column] = a[rowA, column] - b[rowB, column]
    return result
6
  • 1
    Is it neccesary to cythonize the function? If you are using NumPy you should vectorize your code. I think that results = a[:,:,np.newaxis]-b gets the same output. Commented Feb 25, 2018 at 15:09
  • it does get the same output but its too slow Commented Feb 25, 2018 at 15:10
  • You might seed a cdef in the loop. Add one for you subtracted variable. Commented Feb 25, 2018 at 15:10
  • There's a few small things you can do (e.g. set a type of index - it's possible Cython can't guess it). However, this might just be how long the calculation takes. Commented Feb 25, 2018 at 15:19
  • See this bit in the Cython docs: "Gotcha: This efficient indexing only affects certain index operations, namely those with exactly ndim number of typed integer indices. So if v for instance isn’t typed, then the lookup f[v, w] isn’t optimized. On the other hand this means that you can continue using Python objects for sophisticated dynamic slicing etc. just as when the array is not typed." Commented Feb 25, 2018 at 17:22

1 Answer 1

4

When trying to optimize a function, one always should know what is the bottle-neck of this function - without you will spend a lot of time running in the wrong direction.

Let's use your python-function as baseline (actually I use result=np.zeros(shape,dtype=a.dtype) otherwise your method returns floats which is probably a bug):

>>> import numpy as np
>>> a=np.random.randint(1,1000,(300,300), dtype=np.int)
>>> b=np.random.randint(1,1000,(300,300), dtype=np.int)
>>> %timeit subtractPython(a,b)
274 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The first question we should ask ourselves is: Is this task memory or CPU bound? Obviously, this is a memory-bound task - a subtraction is nothing compared to needed memory-read- and write-accesses.

This means, all above we have to optimize the memory layout in order to reduce cache-misses. As a rule of thumb, our memory accesses should access one consecutive memory address after another.

Is this the case? No, the array result is in C-order, i.e. row-major-order and thus the access

results[:, :, index] = subtracted

isn't consecutive. On the other hand,

results[index, :, :] = subtracted

would be a consecutive access. Let's change the way information is stored in result:

def subtract1(a, b):
    xAxisCount = a.shape[0]
    yAxisCount = a.shape[1]

    shape = (xAxisCount,  xAxisCount, yAxisCount) #<=== Change order
    results = np.zeros(shape, dtype=a.dtype)
    for index in range(len(b)):
        subtracted = (a - b[index])
        results[index, :, :] = subtracted   #<===== consecutive access
    return results

The timings are now:

>>> %timeit subtract1(a,b)
>>> 35.8 ms ± 285 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

There are also 2 more small improvements: we don't have to initialize result with zeros and we can save some python overhead, but this gives us just about 5%:

def subtract2(a, b):
    xAxisCount = a.shape[0]
    yAxisCount = a.shape[1]

    shape = (xAxisCount,  xAxisCount, yAxisCount) 
    results = np.empty(shape, dtype=a.dtype)        #<=== no need for zeros
    for index in range(len(b)):
        results[index, :, :] = (a-b[index])   #<===== less python overhead
    return results

>>> %timeit subtract2(a,b)
34.5 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now this is about factor 8 faster than the original version.

You could use Cython to try to speed-up this even further - but the task is probably still memory-bound, so don't expect to get it significantly faster - after all cython cannot make the memory work faster. However, without proper profiling it is hard to tell, how much improvement is possible - would not be surprised, if someone would come up with a faster version.

Sign up to request clarification or add additional context in comments.

1 Comment

wow, I had no idea that simply rearranging an array would lead to such an improvement!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.