1

I have following script:

max_number = 100000
minimums = np.full((max_number), np.inf, dtype=np.float32)
data = np.zeros((max_number, 128, 128, 128), dtype=np.uint8)

if __name__ == '__main__':
    main()

def worker(array, start, end):

    for in_idx in range(start, end):
        value = data[start:end][in_idx] # compute something using this array
        minimums[in_idx] = value

def main():

    jobs = []
    num_jobs = 5
    for i in range(num_jobs):
        start = int(i * (1000 / num_jobs))
        end = int(start + (1000 / num_jobs))

        p = multiprocessing.Process(name=('worker_' + str(i)), target=worker, args=(start, end))
        jobs.append(p)
        p.start()

    for proc in jobs:
        proc.join()
    print(jobs)

How can I ensure that the numpy array is global and can be accessed by each worker? Each worker uses a different part of the numpy array

9
  • Do you really want to share the array between processes? Or would it be enough to share parts of the array with each worker? Commented Aug 31, 2017 at 12:02
  • All I want to do is to split up my array, in order to calculate something at the same time, and then in the end I want the whole array to be calculated perfectly. I do not really need to share anything between the processes, since every worker gets a different range of the original bumpy array. @Dschoni Commented Aug 31, 2017 at 12:04
  • In this case: Use a callback function, that copies the result of your worker to a global array. Pass only the indices to the worker to save on overhead. Do the calculations overlap or are they independent from each other? Commented Aug 31, 2017 at 12:07
  • They are independent from each other. so they do not overlap. However, I do not know how to store a global numpy and use it in the method? Could you show me? @Dschoni Commented Aug 31, 2017 at 12:08
  • I have updated the answer for a better understanding. @Dschoni Commented Aug 31, 2017 at 12:12

1 Answer 1

1
import numpy as np
import multiprocessing as mp

ar = np.zeros((5,5))

def callback_function(result):
    x,y,data = result
    ar[x,y] = data

def worker(num):
    data = ar[num,num]+3
    return num, num, data

def apply_async_with_callback():
    pool = mp.Pool(processes=5)
    for i in range(5):
        pool.apply_async(worker, args = (i, ), callback = callback_function)
    pool.close()
    pool.join()
    print "Multiprocessing done!"

if __name__ == '__main__':
    ar = np.ones((5,5)) #This will be used, as local scope comes before global scope
    apply_async_with_callback()

Explanation: You set up your data array and your workers and callback functions. The number of processes in the pool set up a number of independent workers, where each worker can do more than one task. The callback writes the result back to the array.

The __name__=='__main__' protects the following line from being run at each import.

Sign up to request clarification or add additional context in comments.

19 Comments

And what if I have the ar in another function? like a main() function?
I am calculation the ar array and its size in your apply_async_with_calback() function. This is my main problem. So how do I make this array global?
Than return it from the function before you run multiprocessing on it. The above code is just a minimal example. You can always extend it to suit your needs.
which function? Could you quickly adapt the code so that the ar arrays is initialised in the apply_async_with_callback() function? Sorry but I want to get this 100% right :)
Better not mix multiprocessing parts and initialising with each other. Write a several routine that does all your inits, put it before the multiprocessing, should do the trick.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.