I have the following very time consuming loop, that I'd like to speed up by parallelization.
for y in range(0, 288):
for x in range(0, 352):
w0=Waves.GetSampleValues(Waves.WaveParams((a0[y, x], 0, p0[y, x], 0)))
w1=Waves.GetSampleValues(Waves.WaveParams((a1[y, x], 0, p1[y, x], 0)))
am0, am1, ph0, ph1=DemodPoint(w0, w1, useOld)
resultPhase0[y, x]=ph0
resultPhase1[y, x]=ph1
resultAmp0[y, x]=am0
resultAmp1[y, x]=am1
print("Y: ", y)
return resultAmp0, resultAmp1, resultPhase0, resultPhase1
Every pixel can be computed independently, so there should in theory be no problem. I already found http://pythonhosted.org/joblib/parallel.html which seems to be able to solve problems like this, but I have however not such a generator expression.
As you see, I have multiple result arrays that need to be filled (where the DemodPoint call, which returns everything at once is the expensive part), which does not seem to fit to this structure.
Deeper down on the page there is a more complex example which uses temporary files, but I actually tried to keep things as easy as possible. Similar to OpenMP in C++, I would like to say "just do this in parallel, I know that it will work" without to much additional code. What would be the easiest way in python to do this?
WavesandDemodPoint? My first instinct on seeing this code is to vectorize it, not parallelize it.