I'm trying to improve on the time taken in adding two fixed length arrays. I must convert 2 strings of bytes into 2 short arrays of fixed length and then add the two arrays together, finally outputting the resultant array as a string of bytes.
Currently I have:
import cython
cimport numpy as np
import numpy as np
@cython.boundscheck(False)
@cython.wraparound(False)
def cython_layer( char* c_string1, char* c_string2, int length ):
cdef np.ndarray[ np.int16_t, ndim=1 ] np_orig = np.fromstring( c_string1[:length], np.int16, count=length//2 )
cdef np.ndarray[ np.int16_t, ndim=1 ] np_new = np.fromstring( c_string2[:length], np.int16, count=length//2 )
res = np_orig + np_new
return res.tostring()
however, the simpler numpy only method yields a very similar (better) performance:
def layer(self, orig, new, length):
np_orig = fromstring(orig, np.int16, count=length // 2)
np_new = fromstring(new, np.int16, count=length // 2)
res = np_orig + np_new
return res.tostring()
Is it possible to improve on numpy speed for this simple example ? My gut says yes but I don't have enough of a handle on Cython to improve anymore. Using Ipython %timeit magic I've clocked the functions at:
100000 loops, best of 3: 5.79 µs per loop # python + numpy
100000 loops, best of 3: 8.77 µs per loop # cython + numpy
e.g:
a = np.array( range(1024), dtype=np.int16).tostring()
layer(a,a,len(a)) == cython_layer(a,a,len(a))
# True
%timeit layer(a, a, len(a) )
# 100000 loops, best of 3: 6.06 µs per loop
%timeit cython_layer(a, a, len(a))
# 100000 loops, best of 3: 9.19 µs per loop
edit: changes layer to show size=len(orig)//2 orig and new are both byte arrays of length 2048. Converting them to shorts (np.int16) results in an output array of size 1024.
edit2: I'm an idiot.
edit3: example in action
chunk_size? Your code as posted doesn't work... I think one issue is that yourchar*are probably autoconverted fromstryour function being called and then autoconverted tostr(i.e. unnecessarily copied) before being passed tonp.fromstring.to_string()instead oftostring(). I've also updated the python + numpy solution to implicitly use the length of the byte array. are you suggesting thatnp.fromstring(char)would work ? because it converts only the first 48 bytes to short.char*->str->np.arrayand so it ends up being copied twice. I don't know if that's easily avoidable though.