1

I am trying to do a reduction on multiple variables (an array) using OMP, but wasn't sure how to implement it with OMP. See the code below.

#pramga omp parallel for reduction( ??? )
for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
                [ compute value ... ]

                y[j] += value
        }
}

I thought I could do something like this, with the atomic keyword, but realised this would prevent two threads from updating y at the same time even if they are updating different values.

#pramga omp parallel for
for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
                [ compute value ... ]

                #pragma omp atomic
                y[j] += value
        }
}

Does OMP have any functionality for something like this or otherwise how would I achieve this optimally without OMP's reduction keyword?

1
  • You could declare only the j loop to be omp parallel. If that's inefficient, for instance because the loop is too short, then try to exchange the two loops. Commented Mar 6, 2022 at 22:21

1 Answer 1

1

There is an array reduction available in OpenMP since version 4.5:

#pramga omp parallel for reduction(+:y[:m])

where m is the size of the array. The only limitation here is that the local array used in reduction is always reserved on the stack, so it cannot be used in the case of large arrays.

The atomic operation you mentioned should work fine, but it may be less efficient than reduction. Of course, it depends on the actual circumstances (e.g. actual value of n and m, time to compute value, false sharing, etc.).

#pragma omp atomic
  y[j] += value
Sign up to request clarification or add additional context in comments.

5 Comments

Ah... in my particular case y is dynamically allocated with its size determined at runtime. As you have suggested the atomic operation does work, but from my understanding it would hurt performance when unecessarily - two threads in theory could update y[i] and y[j] for different i and j, but the atomic operation would not enable them to. Is this correct?
atomic operation always gives correct results, and allows update different y[i] and y[j]. The only performance related problem is that if they are in the same cache line, each memory write invalidates the cache line. It is called 'false sharing'. if array y is expected to be big the best is to do the reduction manually.
Please read this if you wish to implement manual array reduction in OpenMP.
Is computation of value is slow or fast? If it is fast, is it possible to swap for loops (do for(int j=...) first?
I think in your comment you confused #pragma omp critical and #pragma omp atomic. #pragma omp critical will not allow more threads to do something in parallel.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.