I'm trying to figure out if I am using the Openmp 4 construct correctly.
So it would be nice if someone could give me some tips..
class XY {
#pragma omp declare target
static void function_XY(){
#pragma omp for
loop{}
#pragma omp end declare target
main() {
var declaration
some sequential stuff
#pragma omp target map(some variables) {
#pragma omp parallel {
#pragma omp for
loop1{}
function_XY();
#pragma omp for
loop2{}
}
}
some more sequential stuff
}
My overall code is working, and getting faster with more threads, but I'm wondering if the code is correctly executed on the target device(xeon phi). Also if i remove all omp stuff and execute my program sequentially it runs faster than execution with multiple threads(any number). Maybe due to initialisation of omp?
What I want is the parallel execution of: loop1, function_XY, loop2 on the targetdevice.