Shared numpy array with multithreading

Question

I have a numpy array(matrix), which I want to fill with calculated values in asynchronously. As a result, I want to have matrix distances with calculated values, but at the end I receive matrix filled with default(-1) value. I understand, that something wrong with sharing distances between threads, but I can't figure out what's exactly wrong.

import numpy as np
import concurrent.futures

data = range(1, 10)
amount = len(data)
default = -1
distances = np.full((amount, amount), default, dtype=np.float32)


def calculate_distance(i, j):
    global distances
    if i == j:
        distances[i][j] = 0
    else:
        calculated = data[i] + data[j] #doesn't matter how is this calculated
        distances[i][j] = calculated
        distances[j][i] = calculated


with concurrent.futures.ProcessPoolExecutor() as executor:
    for i in range(0, amount):
        for j in range(i, amount):
            future = executor.submit(calculate_distance, i, j)
            result = future.result()

executor.shutdown(True)
print(distances)

You're using processes, not threads. I take it that you're using an environment that supports forking. This means that you get processes that have a copy of distances (actually of the memory in general). When these processes modify distances, they modify their own copy. This is a simplification, as copy-on-write may be involved, but the end result is the same. — Ilja Everilä
– Ilja Everilä, Commented Aug 30, 2016 at 10:24
You might also be interested in this Q/A regarding read-only shared numpy arrays for use between processes: stackoverflow.com/questions/17785275/…. — Ilja Everilä
– Ilja Everilä, Commented Aug 30, 2016 at 10:26
First, when using the ProcessPoolExecutor, you need to include a if __name__ == "__main__" guard, otherwise your code won't work on all platforms. Second, you wait for each task to finish before submitting the next one, so the tasks are executed seuqentially. And third, using threads generally won't allow you to get more processor time for Python code due to the Global Interpreter Lock. — Sven Marnach
– Sven Marnach, Commented Aug 30, 2016 at 10:30
@SvenMarnach what should I use to execute tasks in a parallel way? — alex
– alex, Commented Aug 30, 2016 at 10:38

donkopotamus · Accepted Answer · 2016-08-30 10:24:16Z

1

You are using a ProcessPoolExecutor. This will fork new processes for performing work. These processes will not share memory, each instead getting a copy of the distances matrix.

Thus any changes to their copy will certainly not be reflected in the original process.

Try using a ThreadPoolExecutor instead.

NOTE: Globals are generally viewed with distaste ... pass the array into the function instead.

answered Aug 30, 2016 at 10:24

donkopotamus

23.4k3 gold badges58 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Shared numpy array with multithreading

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related