2

I'm trying to update a shared object (a dict) using the following code. But it does not work. It gives me the input dict as an output.

Edit: Exxentially, What I'm trying to achieve here is to append items in the data (a list) to the dict's list. Data items give indices in the dict.

Expected output: {'2': [2], '1': [1, 4, 6], '3': [3, 5]}
Note: Approach 2 raise error TypeError: 'int' object is not iterable

  1. Approach 1

    from multiprocessing import *
    def mapTo(d,tree):
            for idx, item in enumerate(list(d), start=1):
                tree[str(item)].append(idx)
    
    data=[1,2,3,1,3,1]
    manager = Manager()
    sharedtree= manager.dict({"1":[],"2":[],"3":[]})
    with Pool(processes=3) as pool:
        pool.starmap(mapTo, [(data,sharedtree ) for _ in range(3)])
    
  2. Approach 2
 from multiprocessing import *
 def mapTo(d):
         global tree
         for idx, item in enumerate(list(d), start=1):
             tree[str(item)].append(idx)

 def initializer():
      global tree
      tree = dict({"1":[],"2":[],"3":[]})
 data=[1,2,3,1,3,1]
 with Pool(processes=3, initializer=initializer, initargs=()) as pool:
     pool.map(mapTo,data)```
18
  • 2
    Instead of sharing a dict between processes which is a bad idea, return a dict from each process and merge them afterwards. Commented Apr 30, 2020 at 8:25
  • why sharing a dict is a bad idea? In my case, a dict, which is kind of hash table, is really huge and I don't think returning a dict make sense. Commented Apr 30, 2020 at 8:27
  • Also, all processes are supposed to append items in the dict's list. I'm not worried about race condition here since a Manager's list can be updated independently by subprocesses. Commented Apr 30, 2020 at 8:29
  • Sharing data structures between separate processes is kind of tricky. It can certainly be done, but to @JoshuaNixon point, make sure there isn't an easier way to accomplish the task at hand. Commented Apr 30, 2020 at 8:30
  • Approach 2 raises error because it calls mapTo and in each call it passes an individual element of list Commented Apr 30, 2020 at 8:31

1 Answer 1

2

You need to use managed lists if you want the changes to be reflected. So, the following works for me:

from multiprocessing import *
def mapTo(d,tree):
        for idx, item in enumerate(list(d), start=1):
            tree[str(item)].append(idx)

if __name__ == '__main__':
    data=[1,2,3,1,3,1]

    with Pool(processes=3) as pool:
        manager = Manager()
        sharedtree= manager.dict({"1":manager.list(), "2":manager.list(),"3":manager.list()})
        pool.starmap(mapTo, [(data,sharedtree ) for _ in range(3)])

    print({k:list(v) for k,v in sharedtree.items()})

This is the ouput:

{'1': [1, 1, 1, 4, 4, 4, 6, 6, 6], '2': [2, 2, 2], '3': [3, 3, 5, 3, 5, 5]}

Note, you should always use the if __name__ == '__main__': guard when using multiprocessing, also, avoid starred imports...

Edit

You have to do this re-assignment if you are on Python < 3.6, so use this for mapTo:

def mapTo(d,tree):
        for idx, item in enumerate(list(d), start=1):
            l = tree[str(item)]
            l.append(idx)
            tree[str(item)] = l

And finally, you aren't using starmap/map correctly, you are passing the data three times, so of course, everything gets counted three times. A mapping operation should work on each individual element of the data you are mapping over, so you want something like:

from functools import partial
from multiprocessing import *
def mapTo(i_d,tree):
    idx,item = i_d
    l = tree[str(item)]
    l.append(idx)
    tree[str(item)] = l

if __name__ == '__main__':
    data=[1,2,3,1,3,1]

    with Pool(processes=3) as pool:
        manager = Manager()
        sharedtree= manager.dict({"1":manager.list(), "2":manager.list(),"3":manager.list()})
        pool.map(partial(mapTo, tree=sharedtree), list(enumerate(data, start=1)))

    print({k:list(v) for k,v in sharedtree.items()})
Sign up to request clarification or add additional context in comments.

12 Comments

o/p from your code is {'2': [], '1': [], '3': []}.
@chandresh I cannot reproduce. I'm getting the output above. Are you sure you are running this exact code?
yes. I copy pasted your code and run in spyder with py 3.5. Besides, your o/p is not my expected o/p as shown in the question at the top.
@chandresh yes, that is because you are using the pool.map incorrectly. But that's sort of irrelevant to your issue. The problem is that you cannot nest managed objects in Python 3.5 easily, you ahve to do a seemingly redundant re-assignment. You really should probably upgrade...
hmm. seems like assignment to dict's list in py 3.5 is 3 line code.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.