0

I am trying to parallelize a for loop in python that must execute a function with several arguments, one of which will be changing through the loop. The loop itself needs to be embedded in a function. I have already looked here, here and here in stackoverflow and beyond (here and here) but I just cannot make it work :(

Below is a MWE:

import time
import numpy as np
from multiprocessing import Pool
from functools import partial

def mytestFun(otherStuff, myparams):
    return myparams[0]*otherStuff - myparams[1]

def myfun1(extraParams, mylist):
    [myMat, otherStuff] = extraParams
    
    for ivals in mylist:
        myparams = myMat[ivals,:]
        result = mytestFun(otherStuff, myparams)
    return result

if __name__ == '__main__':
    a_list = [0, 1, 2, 3, 4, 5]

    myMat = np.random.uniform(0,1,(6,2))
    extraParams = [myMat, 5]
    print(myfun1(extraParams, a_list))
    pool = Pool()
    func = partial(myfun1, extraParams)
    pool.map(func, a_list)
    pool.close()
    pool.join()

And I keep getting errors that I don't know how to interpret:

Traceback (most recent call last):
  File "exampleMultiProcessing.py", line 61, in <module>
    pool.map(func, a_list)
  File "/Users/laurama/miniconda3/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Users/laurama/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
TypeError: cannot unpack non-iterable int object

Thanks in advance!

2
  • Here dask.delayed can be your friend Commented Jun 25, 2020 at 23:47
  • @ laura We can use joblib here; Commented Jun 26, 2020 at 0:22

1 Answer 1

3

You can read about joblib here. Basically when we use joblib, it expects that we will be passing it the args for the function which we want to parallelise. So here i am passing the args directly to the function, that's why i am looping using the underscore_variable, you can use anything there, no issues at all. Basically I am ignoring the looping variable using the _;

And yes, Parallel automatically will distribute it over n_cores;

Try this:

from joblib import Parallel, delayed    

if __name__ == '__main__': 
    a_list = [0, 1, 2, 3, 4, 5] 
    myMat = np.random.uniform(0,1,(6,2)) 
    extraParams = [myMat, 5] 
    print(myfun1(extraParams, a_list)) 
    result = Parallel(n_jobs=8)(delayed(myfun1)(extraParams, a_list) for _ in range(1))[0]
Sign up to request clarification or add additional context in comments.

4 Comments

it does thank you, but could you please provide more context? I don't understand very much what the solution is doing, in particular, this part: for _ in range(1)) in fact, I have never seen an underscore for a variable. Also, would Parallel automatically distribute the loop among the n_jobs? Thanks!
@Laura; Apologies, i have updated the answer now; Let me know!
Actually, I don't understand what happens, but a quick check finds that this solution is actually significantly slower than just doing the loop both with this trivial example and with my real example... Any ideas?
Ahh now I see why it's slower; you are having a loop over input's, I didn't see that. In that case we need to change the looping strategy in; try modifying it to loop over that array of yours(IE. On your inputs)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.