2

Currently I have a Python program (serial) that calls a C executable (parallel through MPI) through subprocess.run. However, this is a terribly clunky implementation as it means I have to pass some very large arrays back and forth from the Python to the C program using the file system. I would like to be able to directly pass the arrays from Python to C and back. I think ctypes is what I should use. As I understand it, I would create a dll instead of an executable from my C code to be able to use it with Python.

However, to use MPI you need to launch the program using mpirun/mpiexec. This is not possible if I am simply using the C functions from a dll, correct?

Is there a good way to enable MPI for the function called from the dll? The two possibilities I've found are

  • launch the python program in parallel using mpi4py, then pass MPI_COMM_WORLD to the C function (per this post How to pass MPI information to ctypes in python)

  • somehow initialize and spawn processes inside the function without using mpirun. I'm not sure if this is possible.

4
  • do you want to run the MPI program on the same node ? or on many nodes ? Does python execute the MPI program only once ? or several times ? What is the relative duration of the MPI program compared to the python one ? Commented Feb 26, 2018 at 14:03
  • @GillesGouaillardet The MPI program runs on multiple nodes, typically 3-10 nodes of 16 processors each. The Python calls the MPI program many times, typically 10,000+ over the lifetime of a run (this is an MCMC process). Much more time is spent in the MPI C program than the python wrapper. Thank you! Commented Feb 28, 2018 at 2:19
  • do you use a resource manager (e.g. slurm, PBS, LSF or other) ? which MPI library are you using (e.g. Open MPI, mpich or a derivative) ? Commented Feb 28, 2018 at 2:39
  • @GillesGouaillardet I use slurm and mpich 3.2. Thanks! Commented Feb 28, 2018 at 17:29

2 Answers 2

1

One possibility, if you are OK with passing everything through the c program rank 0, is to use subprocess.Popen() with stdin=subprocess.PIPE and the communicate() function on the python side and fread() on the c side.

This is obviously fragile, but does keep everything in memory. Also, if your data size is large (which you said it was) you may have to write the data to the child process in chunk. Another option could be to use exe.stdin.write(x) rather than exe.communicate(x)

I created a small example program

c code (program named child):

#include "mpi.h"
#include "stdio.h"

int main(int argc, char *argv[]){
    MPI_Init(&argc, &argv);

    int size, rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    double ans;
    if(rank == 0){
        fread(&ans, sizeof(ans), 1, stdin);
    }

    MPI_Bcast(&ans, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
    printf("rank %d of %d received %lf\n", rank, size, ans);
    MPI_Finalize();
}

python code (named driver.py):

#!/usr/bin/env python

import ctypes as ct
import subprocess as sp

x = ct.c_double(3.141592)

exe = sp.Popen(['mpirun', '-n', '4', './child'], stdin=sp.PIPE)
exe.communicate(x)

x = ct.c_double(101.1)

exe = sp.Popen(['mpirun', '-n', '4', './child'], stdin=sp.PIPE)
exe.communicate(x)

results:

> python ./driver.py
rank 0 of 4 received 3.141592
rank 1 of 4 received 3.141592
rank 2 of 4 received 3.141592
rank 3 of 4 received 3.141592
rank 0 of 4 received 101.100000
rank 2 of 4 received 101.100000
rank 3 of 4 received 101.100000
rank 1 of 4 received 101.100000

I tried using MPI_Comm_connect() and MPI_Comm_accept() through mpi4py, but I couldn't seem to get that working on the python side.

Sign up to request clarification or add additional context in comments.

Comments

0

Since most of the time is spent in the C subroutine which is invoked multiple times, and you are running within a resource manager, I would suggest the following approach :

Start all the MPI tasks at once via the following command (assuming you have allocated n+1 slots

mpirun -np 1 python wrapper.py : -np <n> a.out

You likely want to start with a MPI_Comm_split() in order to generate a communicator only for the n tasks implemented by the C program. Then you will define a "protocol" so the python wrapper can pass parameters to the C tasks, and wait for the result or direct the C program to MPI_Finalize().

You might as well consider using an intercommunicator (first group is for python, second group is for C) but this is really up to you. Intercommunicator semantic can be seen as non intuitive, so make sure you understand how this works if you want to go into that direction.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.