0

I wanted to use OpenMP around a for-loop that basically sums up the array elements. For comparison, I also sum up the array elements in a serial fashion. For large array-sizes, array_size>=5795, the sums are no longer equal.

This is the OpenMP-parallelized loop:

#pragma omp parallel for reduction(+:parallel_sum)
    for (i = 0; i < array_size; i++) {
        parallel_sum += array[i];
    }

And this is the full code:

#include<stdio.h>
#include<stdlib.h>
#include<omp.h>

int main() {
    float          *array, serial_sum, parallel_sum;
    size_t          array_size, i;

    serial_sum   = 0.;
    parallel_sum = 0.;

#pragma omp parallel
    printf( "number of threads used in parallized part: %d\n", omp_get_num_threads());

    printf("enter size of array\n");
    scanf("%zu", &array_size);

    // memory allocation
    array = (float *) malloc(sizeof(float) * array_size);

    // initialize array
    for (i = 0; i < array_size; i++)
        array[i] = i;

    // calculate the array sums, parallel first, then serial
#pragma omp parallel for reduction(+:parallel_sum)
    for (i = 0; i < array_size; i++) {
        parallel_sum += array[i];
    }

    for (i = 0; i < array_size; i++)
        serial_sum = serial_sum + array[i];

    if (serial_sum == parallel_sum)
        printf("\nserial and parallel sums are equal\n");
    else
        printf("\nserial and Parallel sums are NOT equal\n");

    // free memory
    free(array);

    return 0;
}

Since the sum can be easily calculated by hand using the sum-formula by Gauss, sum = (n^2+n)/2, where n corresponds to array_size, both sums are wrong... I guess I am having some memory issues then...?

4
  • 1
    Typical float types only have ~6 decimal digits of precision. Floating-point addition is not associative if some of the operations are inexact. Commented Sep 21, 2016 at 8:58
  • @EOF ohhhh, of course! That is slightly embarrassing... thanks :) Type that as an answer and I mark it as solution. Commented Sep 21, 2016 at 9:01
  • Depending on your compile options, your serial_sum may involve parallel batched sums, with different partitioning from your parallel reduction. Commented Sep 21, 2016 at 10:28
  • @tim18 I was trying to keep the compiler options short, basically just gcc -fopenmp -Wall Commented Sep 21, 2016 at 10:43

1 Answer 1

1

Since floating-point types have limited precision, addition is not associative (see Are floating point operations in C associative? and Is Floating point addition and multiplication associative?).

So a + (b + c) can evaluate unequal to (a + b) + c.

When parallelizing/vectorizing, the order operations can indeed be different than adding the numbers from lowest to highest sequentially.

If you printed out the computed values, you would likely find them quite close in this case, just not exactly equal.

Sign up to request clarification or add additional context in comments.

2 Comments

and an easy fix for the above mentioned code would be double serial_sum, parallel_sum;
@Alf: Well, you'd end up running into the same problem for higher numbers. Around ~pow(10,8) or so.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.