0

I am learning to use Boost.MPI to parallelize the large amount of computation, here below is just my simple test see if I can get MPI logic correctly. However, I did not get it to work. I used world.size()=10, there are total 50 elements in data array, each process will do 5 iteration. I would hope to update data array by having each process sending the updated data array to root process, and then the root process receives the updated data array then print out. But I only get a few elements updated.

Thanks for helping me.

#include <boost/mpi.hpp>
#include <iostream>
#include <cstdlib>

namespace mpi = boost::mpi;
using namespace std;

#define max_rows 100
int data[max_rows];

int modifyArr(const int index, const int arr[]) {
  return arr[index]*2+1;
}

int main(int argc, char* argv[])
{
  mpi::environment env(argc, argv);
  mpi::communicator world;

  int num_rows = 50;
  int my_number;

  if (world.rank() == 0) {
    for ( int i = 0; i < num_rows; i++)
        data[i] = i + 1;
  }

  broadcast(world, data, 0);

  for (int i = world.rank(); i < num_rows; i += world.size()) {
    my_number = modifyArr(i, data);
    data[i]   = my_number;

    world.send(0, 1, data);

    //cout << "i=" << i << " my_number=" << my_number << endl;

    if (world.rank() == 0)
      for (int j = 1; j < world.size(); j++) 
        mpi::status s = world.recv(boost::mpi::any_source, 1, data);
  }

  if (world.rank() == 0) {
    for ( int i = 0; i < num_rows; i++)
      cout << "i=" << i << " results = " << data[i] << endl;
  }

  return 0;
}

1 Answer 1

2

Your problem is probably here:

mpi::status s = world.recv(boost::mpi::any_source, 1, data);

This is the only way data can get back to the master node.

However, you do not tell the master node where in data to store the answers it is getting. Since data is the address of the array, everything should get stored in the zeroth element.

Interleaving which elements of the array you are processing on each node is a pretty bad idea. You should assign blocks of the array to each node so that you can send entire chunks of the array at once. That will reduce communication overhead significantly.

Also, if your issue is simply speeding up for loops, you should consider OpenMP, which can do things like this:

#pragma omp parallel for
for(int i=0;i<100;i++)
  data[i]*=4;

Bam! I just split that for loop up between all of my processes with no further work needed.

Sign up to request clarification or add additional context in comments.

1 Comment

@ Richard. Thank you. In my case the OpenMP only reduce my for-loop (which is more complicated than this snippet) a little bit, so I am pursuing OpenMPI or parallel boost graph library. I will take your comment of "Interleaving which elements of the array you are processing on each node is a pretty bad idea. You should assign blocks of the array to each node so that you can send entire chunks of the array at once. That will reduce communication overhead significantly." and rewrite the code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.