2

I have a declared an empty list as global variable. and I assign some value to it. but when I try to do multiprocessing pool the global variable is empty list instead of the assigned value.

from multiprocessing import Process, Pool
camera_data = [{"id": "1", "url": "cam-c.jpg", "area": "1"},
                       {"id": "2", "url": "cam-d.jpg", "area": "1"},
                       {"id": "3", "url": "cam-e.jpg", "area": "2"},
                       {"id": "4", "url": "cam-f.jpg", "area": "2"}]

bulb_data = []


def framed_images(fake):
    print(bulb_data)


if __name__ == '__main__':
    print("camera_data - ",camera_data)
    for data in camera_data:
        bulb_data.append({"url": data["url"], "bulb": False})
    print("bulb_data - ",bulb_data)
    # framed_images()
    fake_Data = bulb_data

    with Pool(processes=4) as pool:
        pool.map(framed_images, fake_Data)

I am getting output as:

camera_data - [{'id': '1', 'url': 'cam-c.jpg', 'area': '1'}, {'id': '2', 'url': 'cam-d.jpg', 'area': '1'}, {'id': '3', 'url': 'cam-e.jpg', 'area': '2'}, {'id': '4', 'url': 'cam-f.jpg', 'area': '2'}]

bulb_data - [{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[]
[]
[]
[]

the last four empty list is from multiprocessing pool. I expect an output like this:

[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]
[{'url': 'cam-c.jpg', 'bulb': False}, {'url': 'cam-d.jpg', 'bulb': False}, {'url': 'cam-e.jpg', 'bulb': False}, {'url': 'cam-f.jpg', 'bulb': False}]

in order to edit the list of dictionary in each process while updating in global variable.

4 Answers 4

1

It is likely because you are using the spawn start method, which when used the Process' only inherits the bear minimum from the process that spawned them. This is the default start method on MacOS and Windows, and it's the only start method for Windows OS.

You can read the documentation here on the different start methods.

The documentation also points this out about global variables when using the spawn or forkserver start methods:

Bear in mind that if code run in a child process tries to access a global variable, then the value it sees (if any) may not be the same as the value in the parent process at the time that Process.start was called. However, global variables which are just module level constants cause no problems.

https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#the-spawn-and-forkserver-start-methods

Sign up to request clarification or add additional context in comments.

Comments

1

the input argument(the iterable) is mandatory and u wanted to achieve that by a trick with a non-used input in framed_images and a global list that leads to that output.
the reason behind that is, the multi-process pool creates sub-processes for that function and divides the argument(here is a list called fake_Data) related to the chunk-size parameter per sub-processes and there is no shared memory.
you can reference the object like this, however i never tried. https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects
although, i think you can achieve that with a multi-thread module that uses shared memory. trying with sub-processes isn't what u need.

1 Comment

I understand that. but my task is CPU bound and can't use thread. this code above is just a recreation of my issue. It is important that i use multi processing pool to edit the common global variable.
1

as pointed out, on windows (or mac-os as they both use spawn) each spawned "child" has its own separate memory and imports your script so they are basically doing.

import your_script
print(your_script.bulb_data)

if you ran this code you will get the empty list, and on linux, its slightly different, and you will get your "expected result" as it uses fork but the memory is still not shared, and any modifications to one of them won't affect the other processes.

the way around this is to use a managed list that exists in other process and involves IPC to synchronize the list across processes.

from multiprocessing import Process, Pool
from multiprocessing import Manager
camera_data = [{"id": "1", "url": "cam-c.jpg", "area": "1"},
                       {"id": "2", "url": "cam-d.jpg", "area": "1"},
                       {"id": "3", "url": "cam-e.jpg", "area": "2"},
                       {"id": "4", "url": "cam-f.jpg", "area": "2"}]



def framed_images(fake):
    print(bulb_data)

def initializer_func(bulb_data_list):
    global bulb_data
    bulb_data = bulb_data_list

if __name__ == '__main__':
    manager = Manager()
    bulb_data = manager.list()
    print("camera_data - ",camera_data)
    for data in camera_data:
        bulb_data.append({"url": data["url"], "bulb": False})
    print("bulb_data - ",bulb_data)
    # framed_images()
    fake_Data = list(bulb_data)

    with Pool(processes=4, initializer=initializer_func, initargs=(bulb_data,)) as pool:
        pool.map(framed_images, fake_Data)

note that each access to this "shared list" involves IPC which is slower than normal lists, so keep its use to the minimum, so don't put a lot of objects or big objects in it. sharing states documentation

Comments

0

solution code:

used shared preference and shared a the output of all child process with the main code. thus i can do whatever i want with the results

camera_data = [{"id": "1", "url": "cam-c.jpg", "area": "1"},
                       {"id": "2", "url": "cam-d.jpg", "area": "1"},
                       {"id": "3", "url": "cam-e.jpg", "area": "2"},
                       {"id": "4", "url": "cam-f.jpg", "area": "2"}]

from multiprocessing import SimpleQueue
from multiprocessing.pool import Pool


# initialize worker processes
def init_worker(shared_queue):
    global queue
    queue = shared_queue
    print(queue)


# task executed in a worker process
def task(identifier):
    global queue
    if identifier["area"] == '1':
        queue.put(("GREEN"))
    if identifier["area"] == '2':
        queue.put(("RED"))




# protect the entry point
if __name__ == '__main__':
    # create a shared queue
    shared_queue = SimpleQueue()
    # create and configure the process pool
    fake_data = camera_data
    with Pool(initializer=init_worker, initargs=(shared_queue,)) as pool:
        # issue tasks into the process pool
        _ = pool.map_async(task, fake_data)
        for i in enumerate(fake_data):
            result = shared_queue.get()
            print(f'Got {result}', flush=True)

2 Comments

How exactly does this solve the problem OP was having?
@ryanwebjackson I wanted to update data on a global variable with the help of data's from different child processes. using the above solution i get to access the output of each child and i just use that output to change the values in global variable. simple. It's not the answer for the question. i know. but i found this solution that works for me. If you know how to access a global variable in each child without passing as an argument for pool. then do share

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.