I have a matrix of matrices that I need to do computations on (i.e. a x by y by z by z matrix, I need to perform a computation on each of the z by z matrices, so x*y total operations). Currently, I am looping over the first two dimensions of the matrix and performing the computation, but they are entirely independent. Is there a way I can compute these in parallel, without knowing in advance how many such matrices there will be (i.e. x and y unknown).
-
Yes, python can do this.Mark M– Mark M2021-09-15 01:02:29 +00:00Commented Sep 15, 2021 at 1:02
-
A full example including mathematical operations and realistic inputs (could be random numbers, but with a realistic shape) on what you have tried is always beneficial to get a good answer. ie. it is not known if you have a list of 2d-arrays or a simple 3d-array. Generally there are several ways (vectorized numpy solution, Numba, Cython,etc.) to efficiently implement such an algorithm.max9111– max91112021-09-15 08:36:56 +00:00Commented Sep 15, 2021 at 8:36
3 Answers
Yes; see the multiprocessing module. Here's an example (tweaked from the one in the docs to suit your use case). (I assume z = 1 here for simplicity, so that f takes a scalar.)
from multiprocessing import Pool
# x = 2, y = 3, z = 1 - needn't be known in advance
matrix = [[1, 2, 3], [4, 5, 6]]
def f(x):
# Your computation for each z-by-z submatrix goes here.
return x**2
with Pool() as p:
flat_results = p.map(f, [x for row in matrix for x in row])
# If you don't need to preserve the x*y shape of the input matrix,
# you can use `flat_results` and skip the rest.
x = len(matrix)
y = len(matrix[0])
results = [flat_results[i*y:(i+1)*y] for i in range(x)]
# Now `results` contains `[[1, 4, 9], [16, 25, 36]]`
This will divide up the x * y computations across several processes (one per CPU core; this can be tweaked by providing an argument to Pool()).
Depending on what you're doing, consider trying vectorized operations (as opposed to for loops) with numpy first; you might find that it's fast enough to make multiprocessing unnecessary. (If matrix were a numpy array in this example, the code would just be results = matrix**2.)
Comments
The way I approach parallel processing in python is to define a function to do what I want, then apply it in parallel using multiprocessing.Pool.starmap. It's hard to suggest any code for your problem without knowing what you are computing and how.
import multiprocessing as mp
def my_function(matrices_to_compare, matrix_of_matrices):
m1 = matrices_to_compare[0]
m2 = matrices_to_compare[1]
result = m1 - m2 # or whatever you want to do
return result
matrices_x = <list of x matrices>
matrices_y = <list of y matrices>
matrices_to_compare = list(zip(matrices_x,matrices_y))
with mp.Pool(mp.cpu_count()) as pool:
results = pool.starmap(my_function,
[(x, matrix_of_matrices) for x in matrices_to_compare],
chunksize=1)
Comments
An alternative to the multiprocessing pool approach proposed by other answers - if you have a GPU available possibly the most straightforward approach would be to use a tensor algebra package taking advantage of it, such as cupy or torch.
You could also get some more speedup by jit-compiling your code (for cpu) with cython or numba (and then for gpu programming there's also numba.cuda which however requires some background to use).