0

I'm trying to figure out if I am using the Openmp 4 construct correctly.

So it would be nice if someone could give me some tips..

class XY {
 #pragma omp declare target
  static void function_XY(){
    #pragma omp for
      loop{}
 #pragma omp end declare target


main() {
  var declaration
  some sequential stuff

  #pragma omp target map(some variables) {
  #pragma omp parallel {

  #pragma omp for
     loop1{}

  function_XY();

  #pragma omp for
     loop2{}

  }
  }

  some more sequential stuff
}

My overall code is working, and getting faster with more threads, but I'm wondering if the code is correctly executed on the target device(xeon phi). Also if i remove all omp stuff and execute my program sequentially it runs faster than execution with multiple threads(any number). Maybe due to initialisation of omp?

What I want is the parallel execution of: loop1, function_XY, loop2 on the targetdevice.

1 Answer 1

0

" I'm wondering if the code is correctly executed on the target device(xeon phi)"

Well, if you are correctly compiling the code with -mmic flag, then it will generate a binary that only runs on the mic.

To run the code (in native mode) on the mic, copy the executable to the mic (via scp), copy the needed libraries, SSH to the mic, and execute it.

Don't forget to export LD_LIBRARY_PATH to indicate the path of the libraries on the mic.

Now, assuming that you do run the code on the co-processor, increased performance when disabling threading, indicates that there is a bottleneck somewhere in the code. But this needs more info to analyze.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.