9

I want to be able to use the Values module from the multiprocessing library to be able to keep track of data. As far as I know, when it comes to multiprocessing in Python, each process has it's own copy, so I can't edit global variables. I want to be able to use Values to solve this issue. Does anyone know how I can pass Values data into a pooled function?

from multiprocessing import Pool, Value
import itertools

arr = [2,6,8,7,4,2,5,6,2,4,7,8,5,2,7,4,2,5,6,2,4,7,8,5,2,9,3,2,0,1,5,7,2,8,9,3,2,]

def hello(g, data):
    data.value += 1

if __name__ == '__main__':
    data = Value('i', 0)
    func = partial(hello, data)
    p = Pool(processes=1)
    p.map(hello,itertools.izip(arr,itertools.repeat(data)))

    print data.value

Here is the runtime error i'm getting:

RuntimeError: Synchronized objects should only be shared between processes through inheritance

Does anyone know what I'm doing wrong?

3
  • I think you need to pass your data variable into all the processes. Commented Aug 15, 2015 at 21:49
  • @TomDalton I just updated the code using itertools to pass the data variable into the hello function, I'm getting an error now, I'm not sure why it's occuring. Commented Aug 15, 2015 at 22:24
  • Why aren't you returning data from hello() instead? That's the whole point of a map. Commented Aug 15, 2015 at 22:52

2 Answers 2

11

I don't know why, but there seems to be some issue using the Pool that you don't get if creating subprocesses manually. E.g. The following works:

from multiprocessing import Process, Value

arr = [1,2,3,4,5,6,7,8,9]


def hello(data, g):
    with data.get_lock():
        data.value += 1
    print id(data), g, data.value

if __name__ == '__main__':
    data = Value('i')
    print id(data)

    processes =  []
    for n in arr:
        p = Process(target=hello, args=(data, n))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    print "sub process tasks completed"
    print data.value

However, if you do basically the same think using Pool, then you get an error "RuntimeError: Synchronized objects should only be shared between processes through inheritance". I have seen that error when using a pool before, and never fully got to the bottom of it.

An alternative to using Value that seems to work with Pool is to use a Manager to give you a 'shared' list:

from multiprocessing import Pool, Manager
from functools import partial


arr = [1,2,3,4,5,6,7,8,9]


def hello(data, g):
    data[0] += 1


if __name__ == '__main__':
    m = Manager()
    data = m.list([0])
    hello_data = partial(hello, data)
    p = Pool(processes=5)
    p.map(hello_data, arr)

    print data[0]
Sign up to request clarification or add additional context in comments.

2 Comments

Perfect! This is exactly what I was looking for! Looks like I'll have to end up using Manager instead! Thank you so much Tom!
Ref sharing the Value, this answer (stackoverflow.com/a/9931389/2372812) seems to do some workaround with the pool initialiser to make it work, but it seems like a fairly poor solution. Beware that using a manger results in potentially slow IPC as compared to 'real' shared memory.
-1

There is little need to use Values with Pool.map().

The central idea of a map is to apply a function to every item in a list or other iterator, gathering the return values in a list.

The idea behind Pool.map is basically the same but then spread out over multiple processes. In every worker process, the mapped function gets called with items from the iterator. The return values from the functions called in the worker processes are transported back to the parent process and gathered in a list which is eventually returned.


Alternatively, you could use Pool.imap_unordered, which starts returning results as soon as they are available instead of waiting until everything is finished. So you could tally the amount of returned results and use that to update the progress bar.

6 Comments

What if I want to have, for example, a progress bar? I'd need a counter that all worker processes increment together.
@BarafuAlbino You can use a multiprocessing.Value for that. Note (from the linked documentation) that you'll have to acquire the lock before increasing the value!
Sure. Except that Value does not work with multiprocessing.Pool.map, as the questions states.
@BarafuAlbino You should probably create the Value before if __name__ == "__main__", so that child processes inherit it. And it might depend on the OS. MS-windows is kind of weird in that it doesn't have fork, which makes the implementation of multiprocessing easier on UNIX-like systems.
@BarafuAlbino But imap_unordered is probably simpler.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.