0

Please bear with me as this is a bit of a contrived example of my real application. Suppose I have a list of numbers and I wanted to add a single number to each number in the list using multiple (2) processes. I can do something like this:

import multiprocessing
my_list = list(range(100))
my_number = 5
data_line = [{'list_num': i, 'my_num': my_number} for i in my_list]

def worker(data):
    return data['list_num'] + data['my_num']

pool = multiprocessing.Pool(processes=2)
pool_output = pool.map(worker, data_line)
pool.close() 
pool.join()

Now however, there's a wrinkle to my problem. Suppose that I wanted to alternate adding two numbers (instead of just adding one). So around half the time, I want to add my_number1 and the other half of the time I want to add my_number2. It doesn't matter which number gets added to which item on the list. However, the one requirement is that I don't want to be adding the same number simultaneously at the same time across the different processes. What this boils down to essentially (I think) is that I want to use the first number on Process 1 and the second number on Process 2 exclusively so that the processes are never simultaneously adding the same number. So something like:

my_num1 = 5
my_num2 = 100
data_line = [{'list_num': i, 'my_num1': my_num1, 'my_num2': my_num2} for i in my_list]
def worker(data):
    # if in Process 1:
    return data['list_num'] + data['my_num1']
    # if in Process 2:
    return data['list_num'] + data['my_num2']
    # and so forth

Is there an easy way to specify specific inputs per process? Is there another way to think about this problem?

1 Answer 1

1

multiprocessing.Pool allows to execute an initializer function which is going to be executed before the actual given function will be run.

You can use it altogether with a global variable to allow your function to understand in which process is running.

You probably want to control the initial number the processes will get. You can use a Queue to notify to the processes which number to pick up.

This solution is not optimal but it works.

import multiprocessing


process_number = None


def initializer(queue):
    global process_number

    process_number = queue.get()  # atomic get the process index


def function(value):
    print "I'm process %s" % process_number

    return value[process_number]


def main():
    queue = multiprocessing.Queue()

    for index in range(multiprocessing.cpu_count()):
        queue.put(index)

    pool = multiprocessing.Pool(initializer=initializer, initargs=[queue])

    tasks = [{0: 'Process-0', 1: 'Process-1', 2: 'Process-2'}, ...]

    print(pool.map(function, tasks))

My PC is a dual core, as you can see only Process-0 and Process-1 are processed.

I'm process 0
I'm process 0
I'm process 1
I'm process 0
I'm process 1
...
['Process-0', 'Process-0', 'Process-1', 'Process-0', ... ]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.