more complex parallel for-loop in Python

Question

I have the following very time consuming loop, that I'd like to speed up by parallelization.

for y in range(0, 288):
    for x in range(0, 352):
        w0=Waves.GetSampleValues(Waves.WaveParams((a0[y, x], 0, p0[y, x], 0)))
        w1=Waves.GetSampleValues(Waves.WaveParams((a1[y, x], 0, p1[y, x], 0)))

        am0, am1, ph0, ph1=DemodPoint(w0, w1, useOld)
        resultPhase0[y, x]=ph0
        resultPhase1[y, x]=ph1
        resultAmp0[y, x]=am0
        resultAmp1[y, x]=am1
    print("Y: ", y)
return resultAmp0, resultAmp1, resultPhase0, resultPhase1

Every pixel can be computed independently, so there should in theory be no problem. I already found http://pythonhosted.org/joblib/parallel.html which seems to be able to solve problems like this, but I have however not such a generator expression.

As you see, I have multiple result arrays that need to be filled (where the DemodPoint call, which returns everything at once is the expensive part), which does not seem to fit to this structure.

Deeper down on the page there is a more complex example which uses temporary files, but I actually tried to keep things as easy as possible. Similar to OpenMP in C++, I would like to say "just do this in parallel, I know that it will work" without to much additional code. What would be the easiest way in python to do this?

Are you using NumPy, and do you control the code of Waves and DemodPoint? My first instinct on seeing this code is to vectorize it, not parallelize it. — user2357112
– user2357112, Commented Dec 28, 2013 at 23:22
I control both, but DemodPoint (which is the only expensive thing) performs a numerical optimization using scipy.optimize. I think, there is no hope in vectorizing this. — JonathanK
– JonathanK, Commented Dec 28, 2013 at 23:49

Tom McClure · Accepted Answer · 2013-12-28 23:43:47Z

1

The multiprocessing module can handle this pretty elegantly. You'll incur some overhead copying the source data to each worker process so there's probably a more efficient way to handle this (by writing your own worker class for example), but this ought to work.

Pool.map can only call a worker fn with a single arg, but you can get around this by packing all the inputs into a single tuple:

import multiprocessing

...

def calc_one(xymm):
  x,y,(a0,p0,a1,p1) = xymm
  w0 = Waves.GetSampleValues(Waves.WaveParams((a0[y, x], 0, p0[y, x], 0)))
  w1 = Waves.GetSampleValues(Waves.WaveParams((a1[y, x], 0, p1[y, x], 0)))

  return x,y,DemoPoint(w0, w1, useOld)

inputs = []
for y in range(0, 288):
  for x in range(0, 352):
    inputs.append((x,y,(a0,p0,a1,p1)))

resultPhase0 = {}
resultPhase1 = {}
resultAmp0 = {}
resultAmp1 = {}

num_of_workers = multiprocessing.cpu_count()
pool = multiprocessing.Pool(num_of_workers)

for data in pool.map(calc_one,inputs):
  x,y,(am0,am1,ph0,ph1) = data
  resultPhase0[y,x],resultPhase1[y,x],resultAmp0[y,x],resultAmp1[y,x] = ph0,ph1,am0,am1

return resultAmp0, resultAmp1, resultPhase0, resultPhase1

Of course, for larger datasets you'd want to queue and process the job results on a rolling basis and not wait for all of them to finish first like Pool.map does - I'm just using pool.map here to keep the example super-simple. The multiprocessing docs have some great examples.

edited Dec 28, 2013 at 23:43

answered Dec 28, 2013 at 23:23

Tom McClure

6,7391 gold badge24 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JonathanK Over a year ago

Yes, this works and is ~4.5 times faster on 8 cores, which sound good for me.

Collectives™ on Stack Overflow

more complex parallel for-loop in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related