3

I am trying to parallelize a small part of my python code in Fortran90. So, as a start, I am trying to understand how the spawning function works.

Firstly, I tried to spawn a child process in python from a python parent. I used the example for dynamic process management from the mpi4py tutorial. Everything worked fine. In this case, from what I understand, only the inter-communicator between the parent process and the child process is used.

Then, I moved on to an example for spawning a child process in fortran90 from a python parent. For this, I used an example from one of the previous post in stackoverflow.The python code (master.py) that spawns the fortran child is as follows:

from mpi4py import MPI
import numpy

'''
slavef90 is an executable built starting from slave.f90
'''
# Spawing a process running an executable
# sub_comm is an MPI intercommunicator
sub_comm = MPI.COMM_SELF.Spawn('slavef90', args=[], maxprocs=1)
# common_comm is an intracommunicator accross the python process and the spawned process.
# All kind sof collective communication (Bcast...) are now possible between the python process and the c process
common_comm=sub_comm.Merge(False)
print('parent in common_comm ', common_comm.Get_rank(), ' of  ', common_comm.Get_size())
data = numpy.arange(1, dtype='int32')
data[0]=42
print("Python sending message to fortran: {}".format(data))
common_comm.Send([data, MPI.INT], dest=1, tag=0)

print("Python over")
# disconnecting the shared communicators is required to finalize the spawned process.
sub_comm.Disconnect()
common_comm.Disconnect()

The corresponding fortran90 code (slave.f90) where the child processes get spawned is as follows:

  program test
  !
  implicit none
  !
  include 'mpif.h'
  !
  integer :: ierr,s(1),stat(MPI_STATUS_SIZE)
  integer :: parentcomm,intracomm
  !
  call MPI_INIT(ierr)
  call MPI_COMM_GET_PARENT(parentcomm, ierr)
  call MPI_INTERCOMM_MERGE(parentcomm, 1, intracomm, ierr)
  call MPI_RECV(s, 1, MPI_INTEGER, 0, 0, intracomm,stat, ierr)
  print*, 'fortran program received: ', s
  call MPI_COMM_DISCONNECT(intracomm, ierr)
  call MPI_COMM_DISCONNECT(parentcomm, ierr)
  call MPI_FINALIZE(ierr)
  endprogram test

I compiled the fortran90 code using mpif90 slave.f90 -o slavef90 -Wall. I ran the python code normally using python master.py. I am able to get the desired output, but, the spawned processes won't disconnect, i.e., any statements after the Disconnect commands (call MPI_COMM_DISCONNECT(intracomm, ierr) and call MPI_COMM_DISCONNECT(parentcomm, ierr)) wont be executed in the fortran code (and hence any statements after the Disconnect commands in the python code is also not executed) and my code wont terminate in the terminal.

In this case, to my understanding, the inter-communicator and the intra-communicator are merged so that the child processes and parent processes are not two different groups anymore. And, there seems to be some problem when disconnecting them. But, I am not able to figure out a solution. I tried reproducing the fortran90 code where the child processes are spawned in C++ and in python as well and faced the same problem. Any help is appreciated. Thanks.

8
  • Please use more general language tags fortran python. You can always add a version tag if necessary. See the tag description when adding one. It is recommended to use the MPI module use mpi instead of including the mpif.h file. Then the compiler can check many things and warn you or stop you from doing wrong stuff. Commented Apr 25, 2020 at 12:02
  • What if you MPI_Comm_free() the intra(aka merged) communicator and MPI_Comm_discomnect() only the inter communicator? Commented Apr 25, 2020 at 12:27
  • @GillesGouaillardet Yes, I tried freeing the intra communicators instead of disconnecting them. Unfortunately, the problem still exists. Commented Apr 25, 2020 at 12:34
  • Did you try writing the spawner in C?(to make sure the error does not come from the MPI library) Commented Apr 25, 2020 at 14:48
  • Also, can you try running mpirun -np 1 python master.py Commented Apr 25, 2020 at 14:57

1 Answer 1

1

Note your python script first disconnects the inter-communicator, and then the intra-communicator, but your Fortran program first disconnects the intra-communicator and then the inter communicator.

I am able to run this test on mac (Open MPI and mpi4py installed by brew) after fixing the order and free-ing the intra-communicator.

Here is my master.py

#!/usr/local/Cellar/[email protected]/3.8.2/bin/python3

from mpi4py import MPI
import numpy

'''
slavef90 is an executable built starting from slave.f90
'''
# Spawing a process running an executable
# sub_comm is an MPI intercommunicator
sub_comm = MPI.COMM_SELF.Spawn('slavef90', args=[], maxprocs=1)
# common_comm is an intracommunicator accross the python process and the spawned process.
# All kind sof collective communication (Bcast...) are now possible between the python process and the c process
common_comm=sub_comm.Merge(False)
print('parent in common_comm ', common_comm.Get_rank(), ' of  ', common_comm.Get_size())
data = numpy.arange(1, dtype='int32')
data[0]=42
print("Python sending message to fortran: {}".format(data))
common_comm.Send([data, MPI.INT], dest=1, tag=0)

print("Python over")
# free the (merged) intra communicator
common_comm.Free()
# disconnect the inter communicator is required to finalize the spawned process.
sub_comm.Disconnect()

and my slave.f90

  program test
  !
  implicit none
  !
  include 'mpif.h'
  !
  integer :: ierr,s(1),stat(MPI_STATUS_SIZE)
  integer :: parentcomm,intracomm
  integer :: rank, size
  !
  call MPI_INIT(ierr)
  call MPI_COMM_GET_PARENT(parentcomm, ierr)
  call MPI_INTERCOMM_MERGE(parentcomm, .true., intracomm, ierr)
  call MPI_COMM_RANK(intracomm, rank, ierr)
  call MPI_COMM_SIZE(intracomm, size, ierr)
  call MPI_RECV(s, 1, MPI_INTEGER, 0, 0, intracomm,stat, ierr)
  print*, 'fortran program', rank, ' / ', size, ' received: ', s
  print*, 'Slave frees intracomm'
  call MPI_COMM_FREE(intracomm, ierr)
  print*, 'Slave disconnect intercomm'
  call MPI_COMM_DISCONNECT(parentcomm, ierr)
  print*, 'Slave finalize'
  call MPI_FINALIZE(ierr)
  endprogram test
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much!!!. This makes life a lot easier for me now. I can't believe that I made the stupid mistake of swapping the Disconnect() commands. Anyway, thank you once again!
For closure, could you tell me why the code freezes if we disconnect the intracomm instead of freeing it? I understood (which could be incorrect) from the MPI reference guide that Disconnect() is preferable over Free().
I do not know for sure, but I've heard MPI_Comm_disconnect() might not be correctly implemented in Open MPI, and that could explain the freeze.
it works for me. did you update the master so it sends a message to all the children? if not, children except the first one will be stuck on MPI_Recv(). also, note Open MPI 2.1.1 is no more supported, and you should consider upgrading to a more recent one such as 4.0.3
Yes. Got it!. You're right, I am upgrading my Open MPI as I am commenting. Thank you for the support!.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.