I'm using the following code, which contains an OpenMP parallel for loop nested in another for-loop. Somehow the performance of this code is 4 Times slower than the sequential version (omitting #pragma omp parallel for).
Is it possible that OpenMp has to create Threads every time the method is called? In my test it is called 10000 times directly after each other.
I heard that sometimes OpenMP will keep the threads spinning. I also tried setting OMP_WAIT_POLICY=active and GOMP_SPINCOUNT=INFINITE. When I remove the openMP pragmas, the code is about 10 times faster. Note that the method containing this code will be called 10000 times.
for (round k = 1; k < processor.max; ++k) {
initialise_round(k);
for (std::vector<int> bucket : color_buckets) {
#pragma omp parallel for schedule (dynamic)
for (int i = 0; i < bucket.size(); ++i) {
if (processor.mark.is_marked_item(bucket[i])) {
processor.process(k, bucket[i]);
}
}
processor.finish_round(k);
}
}
color_bucketsandbucket?color_bucketselements. If not doconst std::vector<int> &bucketor useconst auto &bucket.