I am trying to figure out which is the best way to parallelize the execution of a single operation for each cell in a 2D numpy array.
In particular, I need to do a bitwise operation for each cell in the array.
This is what I do using a single for cycle:
for x in range(M):
for y in range(N):
v[x][y] = (v[x][y] >> 7) & 255
I found a way to do the same above using the vectorize method:
def f(x):
return (x >> 7) & 255
f = numpy.vectorize(f)
v = f(v)
However, using vectorize doesn't seem to improve performance.
I read about numexpr in this answer on StackOverflow, where also Theano and Cython are cited. Theano in particular seems a good solution, but I cannot find examples that fit my case.
So my question is: which is the best way to improve the above code, using parallelization and possibly GPU computation? May someone post some sample code to do this?
multiprocessing.Pool, by defining a function for your bitwise operation and sending the list of all your cells. It will then use all your processors to evaluate the result, but you wiil need to reconstruct the array, wich can make you lose time. How big is your array? how long does your calculation take?v[x,y]. Using[x][y]may be equivalent here, but ifv[x]produces a copy rather than a view, it does not work.