3

I am trying to parallel calling the same method over multiple instances, where instances are referring to the same object.

Sorry for this confusion statements.

Specifically, I want to change the following for-loop to parallel execution:

for i in range(len(instances)):#instances is a list of instances
   instances[i].do_some_computation_over_a_dataset()

Is it possible?

Note for future readers:

The above code is not the way to iterate over a collection of instances in Python. This is how to iterate in a sequential (ie non-parallel) way:

for i in instances:
    i.do_some_computation_over_a_dataset()
7
  • @quamrana , I want to ensure all instances have finished the method. Commented Nov 30, 2017 at 16:09
  • What makes you think that Pool doesn't wait? Commented Nov 30, 2017 at 16:11
  • @quamrana , I do not know Pool very much, just guess. Commented Nov 30, 2017 at 16:15
  • The first code example in here: docs.python.org/3/library/multiprocessing.html obviously waits for all processes to finish so that it can print all the results. Commented Nov 30, 2017 at 16:17
  • Ok, @quamrana thanks. Possible there is a difference between this question and you linked question? Here we want to call the same method in multiple instances while there they call the same method over different parameters. Commented Nov 30, 2017 at 16:22

2 Answers 2

3

Ok, let's do it. First the code(multiprocessing docs):

In [1]: from multiprocessing import Process

In [2]: def f():
   ...:     print(1)
   ...:     for i in range(100):
   ...:         # do something
   ...:         pass
   ...:

In [3]: p1 = Process(target=f)

In [4]: p1.start()

1
In [5]: p2 = Process(target=f)

In [6]: p2.start()

1
In [7]: import time

In [8]: def f():
   ...:     for i in range(100):
   ...:         print(i)
   ...:         # do something
   ...:         time.sleep(1)
   ...:         pass
   ...:
In [9]: p1 = Process(target=f)
In [9]: p1 = Process(target=f)

In [10]: p1.start()

0
In [11]: p2 1
= Process(target=f)2
3
4
5
In [11]: p2 = Process(target=f)

In [12]: 6
p2.7
start8
In [12]: p2.start()

0
In [13]: 9

This is an example of how a function can be called in parallel. From In [10]: p1.start() you can see the output gets jumbled because program p1 is running in parallel while we run program p2.

When running the program in a Python script you want to make sure script only ends when all the programs have executed successfully. You can do this by

def multi_process(instance_params, *funcs):
   process = []
   for f in funcs:
       prog = Process(target=f, args=instance_params)
       prog.start()
       process.append(prog)
   for p in process:
       p.join()

multi_process(params, f, f)

Python doesn't have C++ or Java like multithreading support because of GIL. Read about it here. Though if your program is such that it does more I/O operations then CPU intensive tasks then you can use multithreading. For performing CPU intensive tasks multiprocessing is recommended.

In comment @ytutow asked what is difference between pool of workers and process. From Pymotw:

The Pool class can be used to manage a fixed number of workers for simple cases where the work to be done can be broken up and distributed between workers independently.

The return values from the jobs are collected and returned as a list.

The pool arguments include the number of processes and a function to run when starting the task process (invoked once per child).

You can use Pool as:

def your_instance_method(instance):
   instances.do_some_computation_over_a_dataset()

with Pool(3) as p:
    instances = [insatnce_1, instance_2, instance_3]
    print(p.map(your_instance_method, instances))

About the correct number of workers, it's gereral recommendation to have 2*cpu_cores number of workers.

Sign up to request clarification or add additional context in comments.

11 Comments

Thanks! What the difference between multi_process and multiprocessing library in python?
The multiple instances will run the same method over some dataset, for example, compute the mean. I believe it is CPU intensive task.
@ytutow thats its CPU intensive task. Use multiprocessing its Python's standard library. I am not sure if there is any mulit_process module in Python. Is it a 3rd party module?
Sorry I am not very familiar with multiprocessing
@ytutow The answer gives an example of how you can use it. Read the docs I have attached for more info. Does this answer your question? If yes, you can mark this as accepted?
|
2

This code seems to show the difference between a for loop and Pool, calling a method on different instances:

from multiprocessing import Pool

instances = ['a','ab','abc','abcd']


def calc_stuff(i):
    return len(i)


if __name__ == '__main__':

    print('One at a time')
    for i in instances:
        print(len(i))

    print('Use Pool')
    with Pool(4) as pool:
        print(pool.map(calc_stuff, instances))

Note the use of if __name__ == '__main':

This separates each process out.

Output:

One at a time
1
2
3
4
Use Pool
[1, 2, 3, 4]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.