Parallelize function using OpenMP

Question

I'm trying to run code in parallel, but I'm confused with private/shared, etc. stuff related to openmp. I'm using c++ (msvc12 or gcc) and openmp.

The code iterates over the loop which consists of a block that should be run in parallel followed by a block that should be run when all the parallel stuff is done. It doesn't matter in which order the parallel stuff is processed. The code looks like this:

// some X, M, N, Y, Z are some constant values
const int processes = 4;
std::vector<double> vct(X);
std::vector<std::vector<double> > stackVct(processes, std::vector<double>(Y));
std::vector<std::vector<std::string> > files(processes, M)
for(int i=0; i < N; ++i)
{
  // parallel stuff
  for(int process = 0; process < processes; ++process)
  {
    std::vector<double> &otherVct = stackVct[process];
    const std::vector<std::string> &my_files = files[process];

    for(int file = 0; file < my_files.size(); ++file)
    { 
      // vct is read-only here, the value is not modified
      doSomeOtherStuff(otherVct, vct);

      // my_files[file] is read-only
      std::vector<double> thirdVct(Y);
      doSomeOtherStuff(my_files[file], thirdVct(Y));

      // thirdVct and vct are read-only
      doSomeOtherStuff2(thirdVct, otherVct, vct);
    }
  }
  // when all the parallel stuff is done, do this job
  // single thread stuff
  // stackVct is read-only, vct is modified
  doSingleTheadStuff(vct, stackVct)
}

If it is better for performance, "doSingleThreadSuff(...)" can be moved into the parallel loop, but it needs to be processed by a single thread. The order of functions in the most inner loop cannot be changed.

How should I declare #pragma omp stuff to make it working? Thanks!

SirGuy · Accepted Answer · 2013-09-14 19:17:53Z

1

To run a for loop in parallel is just #pragma omp parallel for above the for loop statement and whatever variables are declared outside the for loop are shared by all the threads and whatever variables are declared inside the for loop are private to each thread.

Note that if you are doing file IO in parallel you may not see much speedup (next to none if all you are doing is file IO) unless at least some of the files reside on different physical hard drives.

answered Sep 14, 2013 at 19:17

SirGuy

10.8k2 gold badges39 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

morph Over a year ago

I'm doing some IO stuff in "doSomeOtherStuff(...)" but I can pre-load everything into the memory

morph Over a year ago

and thx... if it is so simply, that it implies that the reason why it crashes is not in my code, but in one of the libraries I'm using...

Y.H. · Accepted Answer · 2013-09-14 20:45:43Z

1

Maybe something like this (mind you this is just a sketch, I did not verify it but you can get the idea):

// some X, M, N, Y, Z are some constant values
const int processes = 4;
std::vector<double> vct(X);
std::vector<std::vector<double> > stackVct(processes, std::vector<double>(Y));
std::vector<std::vector<std::string> > files(processes, M)
for(int i=0; i < N; ++i)
{
    // parallel stuff
    #pragma omp parallel firstprivate(vct, files) shared(stackVct)
    {
        #pragma omp for
        for(int process = 0; process < processes; ++process)
        {
            std::vector<double> &otherVct = stackVct[process];
            const std::vector<std::string> &my_files = files[process];

            for(int file = 0; file < my_files.size(); ++file)
            {
                // vct is read-only here, the value is not modified
                doSomeOtherStuff(otherVct, vct);

                // my_files[file] is read-only
                std::vector<double> thirdVct(Y);
                doSomeOtherStuff(my_files[file], thirdVct(Y));

                // thirdVct and vct are read-only
                doSomeOtherStuff2(thirdVct, otherVct, vct);
            }
        }
        // when all the parallel stuff is done, do this job
        // single thread stuff
        // stackVct is read-only, vct is modified
        #pragma omp single nowait
        doSingleTheadStuff(vct, stackVct)
    }
}

I marked vct and files as first private because they are read only and I assumed they should not be modified, so each thread will get a copy of these variables for itself.
The stackVct is marked as shared among all threads because they modify it.
Finally only one thread will execute the doSingleTheadStuff function without forcing other threads to wait.

answered Sep 14, 2013 at 20:45

Y.H.

2,8862 gold badges30 silver badges38 bronze badges

5 Comments

morph Over a year ago

"without forcing other threads to wait." Hold on - I'm not sure if I understand well what you say. "doSingleThread" should be processed when all the parallel stuff is done... But next parallel stuff (i.e. in the next iteration) should continue after the singleThread stuff is done. Will it work like that?

morph Over a year ago

btw using this: "#pragma omp parallel firstprivate(vct, files) shared(stackVct)" seems to be weird... if I use it, it runs many more processes...

Y.H. Over a year ago

There is an implicit barrier at the end of the for directive, meaning that all working threads are guaranteed to finish their work before leaving the inner for-loop. Afterwards only one thread will execute doSingleThreadStuff, the nowait causes other threads to not wait for the executing thread and proceed to consume some of the work of a new iteration of the outer for-loop. If you really want them to wait until the single thread finishes you can remove the nowait option.

morph Over a year ago

Ok, thank you so much. Could you please explain also why openmp runs the parallel loop many more times if I use firstprivate?

Y.H. Over a year ago

I have no idea. The purpose of firstprivate is to make a private copy of the variable for each thread where the new copies are initialized with the value of the original variable. It should not affect the size of threads team. You can check the number of threads running anytime by using omp_get_num_threads function, you can also use omp_get_thread_num to get the ID of the running thread. Use those for debugging and verifying this behavior.

Collectives™ on Stack Overflow

Parallelize function using OpenMP

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related