I am trying to run my simulations in a threadpool and store my results for each repetition in a global numpy array. However, I get problems while doing that and I am observing a really interesting behavior with the following (, simplified) code (python 3.7):
import numpy as np
from multiprocessing import Pool, Lock
log_mutex = Lock()
repetition_count = 5
data_array = np.zeros(shape=(repetition_count, 3, 200), dtype=float)
def record_results(repetition_index, data_array, log_mutex):
log_mutex.acquire()
print("Start record {}".format(repetition_index))
# Do some stuff and modify data_array, e.g.:
data_array[repetition_index, 0, 53] = 12.34
print("Finish record {}".format(repetition_index))
log_mutex.release()
def run(repetition_index):
global log_mutex
global data_array
# do some simulation
record_results(repetition_index, data_array, log_mutex)
if __name__ == "__main__":
random.seed()
with Pool(thread_count) as p:
print(p.map(run, range(repetition_count)))
The issue is: I get the correct "Start record & Finish record" outputs, e.g. Start record 1... Finish record 1. However, the different slices of the numpy array that are modified by each thread is not kept in the global variable. In other words, the elements that have been modified by thread 1 is still zero, a thread 4 overwrites different parts of the array.
One additional remark, the address of the global array, which I retrieve by
print(hex(id(data_array))) is the same for all threads, inside their log_mutex.acquire() ... log_mutex.release() lines.
Am I missing a point? Like, there are multiple copies of the global data_array stored for each thread? I am observing some behavior like this but this should not be the case when I use global keyword, am I wrong?
idis only unique within a process