How to modify global numpy array safely with multithreading in Python?

Question

I am trying to run my simulations in a threadpool and store my results for each repetition in a global numpy array. However, I get problems while doing that and I am observing a really interesting behavior with the following (, simplified) code (python 3.7):

import numpy as np
from multiprocessing import Pool, Lock

log_mutex = Lock()
repetition_count = 5
data_array = np.zeros(shape=(repetition_count, 3, 200), dtype=float)

def record_results(repetition_index, data_array, log_mutex):
    log_mutex.acquire()
    print("Start record {}".format(repetition_index))
    # Do some stuff and modify data_array, e.g.:
    data_array[repetition_index, 0, 53] = 12.34
    
    print("Finish record {}".format(repetition_index))
    log_mutex.release()

def run(repetition_index):
    global log_mutex
    global data_array

    # do some simulation

    record_results(repetition_index, data_array, log_mutex)

if __name__ == "__main__":
    random.seed()
    with Pool(thread_count) as p:
        print(p.map(run, range(repetition_count)))

The issue is: I get the correct "Start record & Finish record" outputs, e.g. Start record 1... Finish record 1. However, the different slices of the numpy array that are modified by each thread is not kept in the global variable. In other words, the elements that have been modified by thread 1 is still zero, a thread 4 overwrites different parts of the array.

One additional remark, the address of the global array, which I retrieve by print(hex(id(data_array))) is the same for all threads, inside their log_mutex.acquire() ... log_mutex.release() lines.

Am I missing a point? Like, there are multiple copies of the global data_array stored for each thread? I am observing some behavior like this but this should not be the case when I use global keyword, am I wrong?

You aren't using multiple threads, you are using multiple processes. id is only unique within a process — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Nov 16, 2020 at 20:23

Melih Elibol · Accepted Answer · 2020-11-16 20:21:02Z

2

Looks like you're running the run function using multiple processes, not multiple threads. Try something like this instead:

import numpy as np
from threading import Thread, Lock

log_mutex = Lock()
repetition_count = 5
data_array = np.zeros(shape=(repetition_count, 3, 200), dtype=float)

def record_results(repetition_index, data_array, log_mutex):
    log_mutex.acquire()
    print("Start record {}".format(repetition_index))
    # Do some stuff and modify data_array, e.g.:
    data_array[repetition_index, 0, 53] = 12.34
    print("Finish record {}".format(repetition_index))
    log_mutex.release()

def run(repetition_index):
    global log_mutex
    global data_array
    record_results(repetition_index, data_array, log_mutex)

if __name__ == "__main__":
    threads = []
    for i in range(repetition_count):
        t = Thread(target=run, args=[i])
        t.start()
        threads.append(t)
    for t in threads:
        t.join()

Update:

To do this with multiple processes, you would need to use multiprocessing.RawArray to instantiate your array; the size of the array is the product repetition_count * 3 * 200. Within each process, create a view on the array using np.frombuffer, and reshape it accordingly. While this will be very fast, I discourage this style of programming as it relies on global shared memory objects, which are error-prone in larger programs.

If possible, I suggest removing the global data_array and instead instantiate an array in each call to record_results, which you would return in run. The p.map call will return a list of arrays, which you can convert to a numpy array and recover the shape and contents of the global data_array in your original implementation. This will incur a communication cost, but it's a cleaner approach to managing concurrency and eliminates the need for locks.

It's generally a good idea to minimize inter-process communication, but unless performance is critical, I don't think shared memory is the right solution. With p.map, you'll want to avoid returning large objects, but the object sizes in your snippet are very small (600*8 bytes).

edited Nov 16, 2020 at 20:21

answered Nov 12, 2020 at 6:39

Melih Elibol

863 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

OnurA Over a year ago

indeed, there should be a ThreadPool of python, however it is not documented yet.

OnurA Over a year ago

But what is the correct way of doing it with a (Process)Pool?

juanpa.arrivillaga Over a year ago

@OnurA here is a thread pool: docs.python.org/3/library/…

Melih Elibol Over a year ago

I've updated my response explaining a couple of solutions with p.map.

Collectives™ on Stack Overflow

How to modify global numpy array safely with multithreading in Python?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related