Strategy for memory sharing between Python and an MPI process

Question

I have a Python script that generates a system matrix. This happens serially, on one processor, in one process, nothing parallelized. I also have a solver code. The code runs on many processors using MPI.

Currently, the Python script creates the matrix, writes it to a file, calls the solver via subprocess.call(["mpirun ....."]), the solver reads the matrix from file, solves, writes back to a file and finally the Python script reads the result back from the file.

Now I'm looking for something more efficient, avoiding the file read/writes. One idea is to start the MPI process and run it in background, and then transfer data and commands with some sort of interprocess communication between Python and the solver.

How can I do interprocess communication in Python? Or are there better alternatives?

What I want to avoid is using the Python script within MPI (MPI4Py), because of debuggability and because parallelization there makes no sense.

Are you certain that file I/O is the rate-limiting step here? — Arya McCarthy
– Arya McCarthy, Commented May 16, 2017 at 9:06
mpiexec typically redirects its standard input to the standard input of rank 0 and does the opposite for the standard output of all ranks. Simply open a pipe to the mpiexec command, send the matrix, then read the result. Just make sure that no rank other than 0 outputs to the standard output. Or use os.mkfifo() to create a separate FIFO. — Hristo Iliev
– Hristo Iliev, Commented May 16, 2017 at 12:19

Zulan · Accepted Answer · 2017-05-16 09:13:47Z

The simplest way would be to use /dev/shm or some other RAM-backed temporary file system. Given that you are working in Python anyway, this will likely give very reasonable performance. I would resort to more complicated methods only if measurements show specifically that this is a bottleneck and that there is potential for performance improvement.

Now that of course assumes that at least some of the MPI ranks run on the same node that the Python script runs on. If not all of the ranks run on the same node, you might want to broadcast/scatter the data within the MPI solver.

Now you could use the facilities of MPI to dynamically establish communication (MPI_Comm_connect etc.). Or you could even use dynamic process management, e.g. use MPI_Comm_spawn instead of mpirun from Python. I would argue that this would introduce much more complexity and likely not a significant performance gain versus a RAM-backed file. It may also not be very well supported on HPC systems.

Collectives™ on Stack Overflow

Strategy for memory sharing between Python and an MPI process

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related