Python run function parallel

Question

I want to run a "main"-function for n times. This function starts other functions when it is running. The "main"-function is called "repeat" and when it is running it first starts the function "copula_sim" and from there I get an output which is called "total_summe_liste". This list will be added to "mega_summe_list" which safes all outputs from the n runs. The sorted "total_summe_liste" will be safed as " RM_list" which is the input for the functions "VaR_func", "CVaR_func" and "power_func" which all generate an output which is sorted in the specific list "RM_VaR_list", "RM_CVaR_list" or "RM_PSRM_list". After that "RM_list" and "total_summe_liste" will be cleared before the next run begins.

In the end I got "mega_summe_list", "RM_VaR_list", "RM_CVaR_list" and "RM_PSRM_list" which will be used to generate an plot and a dataframe.

Now I want to run the "repeat"-function parallel. For example when I want to run this function n=10 times I want to run it on 10 cpu cores at the same time. The reason is that "copula_sim" is a monte-carlo-simulation which take a while when I make a big simulation.

What I have is this:

total_summe_liste = []
RM_VaR_list = []
RM_CVaR_list = []
RM_PSRM_list = []
mega_summe_list = []

def repeat():
    global RM_list
    global total_summe_liste
    global RM_VaR_list
    global RM_CVaR_list
    global RM_PSRM_list
    global mega_summe_list

    copula_sim(runs_sim, rand_x, rand_y, mu, full_log=False)
    mega_summe_list += total_summe_liste
    RM_list = sorted(total_summe_liste)    
    VaR_func(alpha)
    RM_VaR_list.append(VaR)    
    CVaR_func(alpha)
    RM_CVaR_list.append(CVaR)
    power_func(gamma)
    RM_PSRM_list.append(risk)
    RM_list = []
    total_summe_liste = []

n = 10

for i in range(0,n):
    repeat()

which is working so far.

I tryed:

if __name__ == '__main__':
    jobs = []
    for i in range(0,10):
        p = mp.Process(target=repeat)
        jobs.append(p)
        p.start()

But when I run this the "mega_summe_list" is empty.. When I add "print(VaR) to repeat then it shows me all 10 VaR when its done. So the parallel task is working so far.

What is the problem?

noufel13 · Accepted Answer · 2019-08-18 15:27:59Z

2

The reason for this issue is because, the list mega_summe_list is not shared between the processes.

When you invoke parallel processing in python all the functions and variables are imported and run independently in different processes.

So, for instance when you start 5 processes, 5 different copies of these variables are imported and run independently. So, when you access mega_summe_list in main it is still empty, because it is empty in this process.

To enable synchronization between processes, you can use a list proxy from the multiprocessing package. A Multiprocessing manager maintains an independent server process where in these python objects are held.

Below is the code used to create a multiprocessing Manager List,

from multiprocessing import Manager
mega_summe_list = Manager().List()

Above code can be used instead of mega_summe_list = [] while using multiprocessing.

Below is an example,

from multiprocessing.pool import Pool
from multiprocessing import Manager


def repeat_test(_):
    global b, mp_list
    a = [1,2,3]
    b += a
    mp_list += a # Multiprocessing Manager List
    a = []

if __name__ == "__main__":
    b = []
    mp_list = Manager().list()

    p = Pool(5)
    p.map(repeat_test, range(5))
    print("a: {0}, \n mp_list: {1}".format(b, mp_list))

Output:

b: [],
 mp_list: [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

Hope this solves your problem.

edited Aug 18, 2019 at 15:27

answered Aug 18, 2019 at 14:54

noufel13

6634 silver badges4 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pablo Over a year ago

Thank you! But I have a problem running this. When I copy you code b an mp_list is not defined. When I define b = [] and mp_list = [], it runs but the output is a empty list.

Pablo Over a year ago

I think I figured it out. I will post my code in the next days. One more thing: how do I deal with very big arrays? in the moment I do "test_mega += shared_list" but "shared_list" has go 1.000.000 entrys and I got 10 of them which I add to "test_mega". This takes longer then the simulation now... Is there a more efficent way?

zmbq · Accepted Answer · 2019-08-18 14:25:20Z

0

You should use the Multiprocessing Pool, then you can do something like:

p = Pool(10)
p.map(repeat, range(10))

answered Aug 18, 2019 at 14:25

zmbq

39.1k15 gold badges109 silver badges189 bronze badges

1 Comment

Pablo Over a year ago

When I run the code I tryed it's generating the output I need in every run. But it doesn't safe it to the global list. It's also looks like that the code which follows on the "if name == 'main':" is executing before the 10 taks finished..

Pablo · Accepted Answer · 2019-08-19 11:33:09Z

I solved the problem this way:

This function is the function I want to repeat n times in parallel way:

from multiprocessing import Process
from multiprocessing import Manager
from multiprocessing.pool import Pool

def repeat(shared_list, VaR_list, CVaR_list, PSRM_list, i):
    global RM_list
    global total_summe_liste

    copula_sim(runs_sim, rand_x, rand_y, mu, full_log=False)
    shared_list += total_summe_liste
    RM_list = sorted(total_summe_liste)    
    VaR_func(alpha)
    VaR_list.append(VaR)    
    CVaR_func(alpha)
    CVaR_list.append(CVaR)
    power_func(gamma)
    PSRM_list.append(risk)
    RM_list = []
    total_summe_liste = []

This part manages the shared lists and do the paralleling stuff. Thanks @noufel13!

RM_VaR_list = []
RM_CVaR_list = []
RM_PSRM_list = []
mega_summe_list = []

if __name__ == "__main__":
    with Manager() as manager:
        shared_list = manager.list()
        VaR_list = manager.list()
        CVaR_list = manager.list()
        PSRM_list = manager.list()
        processes = []
        for i in range(12):
            p = Process(target=repeat, args=(shared_list, VaR_list, CVaR_list, PSRM_list, i))  # Passing the list
            p.start()
            processes.append(p)
        for p in processes:
            p.join()
        RM_VaR_list += VaR_list
        RM_CVaR_list += CVaR_list
        RM_PSRM_list += PSRM_list
        mega_summe_list += shared_list

    RM_frame_func()
    plotty_func()

Thank you!

The only question left is how I handle big arrays? Is there a way to do this morr efficiently? One of the 12 shared lists can have more than 100.000.000 items so in total the mega_summe_list has got about 1.200.000.000 items...

Collectives™ on Stack Overflow

Python run function parallel

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related