1

I have problems parallelizing this code, I think I have to use the critical clause but I don't know how...

#include <stdio.h>
#include <sys/time.h>

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int i, j, k;
int histo[PIXMAX], image[N4][N5];

void calculate_histo(int *array, int matrix[N4][N5]) {

for(i=0; i<PIXMAX; i++) array[i] = 0;

    #pragma omp parallel
    {
        int array_private[PIXMAX];
        for(i=0; i<PIXMAX; i++) array_private[i] = 0;

        #pragma omp for
        for(i=0; i<N4; i++)
            for(j=0; j<N5; j++) {
                array_private[matrix[i][j]]++;
                                }
        #pragma omp critical
        {
            for(i=0; i<PIXMAX; i++) {
                array[i] += array_private[i];
            }
        }
    }
}
main ()
{
omp_set_num_threads(NUM_THREADS);

for(i=0; i<N4; i++)
   for(j=0; j<N5; j++)
   {
     if(i%3) image[i][j] = (i+j) % PIXMAX;
     else    image[i][j] = (i+i*j) % PIXMAX;
   }
calculate_histo(histo,image);

for (k=0; k<PIXMAX; k++) printf("%9d", histo[k]);
}

I get different results each time I run it, the outputs in 5 executions:

1.- 3424378  1765911  2356499  1767451  2354765  2123619  2355686  1767270  2355937  1762464
2.- 3359050  1728213  2310171  1727858  2309947  2094584  2309402  1727705  2310021  1726228
3.- 3479377  1782549  2373773  1783920  2372319  2153420  2374614  1785481  2375290  1781468
4.- 3459613  1781119  2362956  1783067  2362662  2154083  2360726  1781994  2362982  1779394
5.- 3434711  1751408  2349619  1750327  2348681  2104916  2348510  1750427  2350599  1747760

Problems solved, all working fine, thanks for the help! the final code I use is this:

See the comments for more information, like not using global variables or using matrix[i* 5000 + j] instead of matrix[i][j]

#include<stdio.h>
#include<sys/time.h>
#include<omp.h>

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int histo[PIXMAX], image[N4][N5];
int i,j,k;
void calculate_histo(int *array, int matrix[N4][N5]) {

for(i=0; i<PIXMAX; i++) array[i] = 0;

#pragma omp parallel private(i,j)
  {
    int array_private[PIXMAX];

    for(i=0; i<PIXMAX; i++)
      array_private[i] = 0;

#pragma omp for
    for(i=0; i<N4; i++)
      for( j=0; j<N5; j++) {
    array_private[matrix[i][j]]++;
      }

#pragma omp critical
    {
      for( i=0; i<PIXMAX; i++) {
    array[i] += array_private[i];
      }
    }
  }
}

int main () {

  omp_set_num_threads(NUM_THREADS);
  for( i=0; i<N4; i++)
    for( j=0; j<N5; j++) {
      if(i%3) 
        image[i][j] = (i+j) % PIXMAX;
      else
        image[i][j] = (i+i*j) % PIXMAX;
    }

  for ( k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");

  calculate_histo(histo,image);

  for ( k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");
  return 0;
}
7
  • I tried with #pragma omp parallel for private(i,j) in the outer loop, and #pragma omp parallel for private(j) in the inner loop. If I write #pragma omp critical inside the inner loop the programm doesn't end Commented Jun 7, 2013 at 18:20
  • 2
    That code you posted is much clearer. You should add which compiler you're using and what OS. Also, which compile options. I compiled with "g++ foo.cpp -fopenmp -O3 -Wall". I had to add "#include <omp.h>". I get the same result every time "3919000 1999500 2666500 1999500 2666500 2417000 2666500 1999500 2666500 1999500". Commented Jun 9, 2013 at 21:36
  • that output is the correct! I use mac OSX, gcc -fopenmp -o foo foo.c Commented Jun 9, 2013 at 21:39
  • Hmmm...I don't know why it's not working for you. Personally, I never (almost never) use global variables. If you look at my code notice that I defined them in main as "int array[10]; int *matrix = new int[5000*5000];" The 5000x5000 matrix is large (95Mb) so I think you should allocated it dynamically (on the heap). In C use malloc. It's a bit of pain to do this with 2D arrays which is one of the reasons I do matrix[i*5000+j] instead of matrix[i][j]. Commented Jun 10, 2013 at 7:06
  • Also, @Massimiliano is right about the race condition on i. I always declare them in the loop decleration so it's not a problem. This is simple to fix. Just define int i in the parallel block for example just before you define array_private. That will make it private. Commented Jun 10, 2013 at 7:15

2 Answers 2

1

You could use atomic to do this but it won't be efficient. A better way is to use a private array for each thread, fill them in parallel, and then fill the shared array in a critical section. See the code below. It's also possible to do this without a critical section but it's a bit more complicated Fill histograms (array reduction) in parallel with OpenMP without using a critical section

Here is the function I recommend (I use matrix[i*5000 + j] instead of matrix[i][j] because Fortran and C do the indexing opposite of each other and I can never remember which is which).

void foo_omp_v2(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    #pragma omp parallel
    {
        int array_private[10];
        for(int i=0; i<10; i++) array_private[i] = 0;

        #pragma omp for
        for(int i=0; i<5000; i++) {
            for(int j=0; j<5000; j++) {
                array_private[matrix[i*5000 + j]]++;
            }
        }
        #pragma omp critical 
        {
            for(int i=0; i<10; i++) {
                array[i] += array_private[i];
            }
        }
    }
}

Here is the full code I used showing atomic being worse

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

void foo(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    for(int i=0; i<5000; i++) {
        for(int j=0; j<5000; j++) {
          array[matrix[i*5000 + j]]++;
        }
    }

    for(int i=0; i<10; i++) {
        printf("%d ", array[i]);
    } printf("\n");
}

void foo_omp_v1(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    #pragma omp parallel for
    for(int i=0; i<5000; i++) {
        for(int j=0; j<5000; j++) {
            #pragma omp atomic
            array[matrix[i*5000 + j]]++;
        }
    }

    for(int i=0; i<10; i++) {
        printf("%d ", array[i]);
    } printf("\n");
}

void foo_omp_v2(int *array, int *matrix) {
    for(int i=0; i<10; i++) array[i] = 0;

    #pragma omp parallel
    {
        int array_private[10];
        for(int i=0; i<10; i++) array_private[i] = 0;

        #pragma omp for
        for(int i=0; i<5000; i++) {
            for(int j=0; j<5000; j++) {
                array_private[matrix[i*5000 + j]]++;
            }
        }
        #pragma omp critical 
        {
            for(int i=0; i<10; i++) {
                array[i] += array_private[i];
            }
        }
    }

    for(int i=0; i<10; i++) {
        printf("%d ", array[i]);
    } printf("\n");
}

int main() {
    int array[10];
    int *matrix = new int[5000*5000];
    for(int i=0; i<(5000*5000); i++) {
        matrix[i]=rand()%10;
    }

    double dtime;

    dtime = omp_get_wtime();
    foo(array, matrix);
    dtime = omp_get_wtime() - dtime;
    printf("time %f\n", dtime);

    dtime = omp_get_wtime();
    foo_omp_v1(array, matrix);
    dtime = omp_get_wtime() - dtime;
    printf("time %f\n", dtime);

    dtime = omp_get_wtime();
    foo_omp_v2(array, matrix);
    dtime = omp_get_wtime() - dtime;
    printf("time %f\n", dtime);

}

Here is the version of your code that works for me in GCC and Visual Studio

#include <stdio.h>
#include <omp.h>
//#include <sys/time.h>

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int histo[PIXMAX], image[N4][N5];
void calculate_histo(int *array, int matrix[N4][N5]) {

    int i;
    for(i=0; i<PIXMAX; i++) array[i] = 0;

    #pragma omp parallel
    {
        int i,j;
        int array_private[PIXMAX];
        for(i=0; i<PIXMAX; i++) array_private[i] = 0;

        #pragma omp for
        for(i=0; i<N4; i++)
            for(j=0; j<N5; j++) {
                array_private[matrix[i][j]]++;
                                }
        #pragma omp critical
        {
            for(i=0; i<PIXMAX; i++) {
                array[i] += array_private[i];
            }
        }
    }
}
int main () {
    omp_set_num_threads(NUM_THREADS);

    int i,j;
    for(i=0; i<N4; i++)
       for(j=0; j<N5; j++)
       {
         if(i%3) image[i][j] = (i+j) % PIXMAX;
         else    image[i][j] = (i+i*j) % PIXMAX;
       }
    calculate_histo(histo,image);

    for (i=0; i<PIXMAX; i++) 
        printf("%9d", histo[i]);
        printf("\n");
}
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for helping raxman! I have different results each time I run it :S this is my code pastebin.com/S5Y7tc4X
That's quite short code and actually it's quite clear. Why don't you update your question with that code (in case others want to look at it)? I compiled it and ran it a few times with a different number of threads and I get the same result each time? What do you mean you're getting different results every time?
1

Your program has two major issues:

  1. there is a race condition on the loop variable i and j
  2. omp_set_num_threads is implicitly declared

Here is a fixed copy of your source with the corrections highlighted:

#include<stdio.h>
#include<sys/time.h>
#include<omp.h> // Problem # 2

#define N4 5000
#define N5 5000
#define PIXMAX 10
#define NUM_THREADS 4

int histo[PIXMAX], image[N4][N5];

void calculate_histo(int *array, int matrix[N4][N5]) {

for(int i=0; i<PIXMAX; i++) array[i] = 0;

#pragma omp parallel
  {
    int array_private[PIXMAX];

    for(int i=0; i<PIXMAX; i++) // # Problem # 1
      array_private[i] = 0;

#pragma omp for
    for(int i=0; i<N4; i++)
      for(int j=0; j<N5; j++) { // # Problem # 1
    array_private[matrix[i][j]]++;
      }

#pragma omp critical
    {
      for(int i=0; i<PIXMAX; i++) {
    array[i] += array_private[i];
      }
    }
  }
}

int main () {

  omp_set_num_threads(NUM_THREADS);
  for(int i=0; i<N4; i++)
    for(int j=0; j<N5; j++) {
      if(i%3) 
        image[i][j] = (i+j) % PIXMAX;
      else
        image[i][j] = (i+i*j) % PIXMAX;
    }

  for (int k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");

  calculate_histo(histo,image);

  for (int k=0; k<PIXMAX; k++) 
    printf("%9d", histo[k]);
  printf("\n");
  return 0;
}

For the first point I would suggest using C99 standard, which permits to declare variables within the body of a function (enhancing thus the locality of their use).

Regarding the implicit declaration: if you don't declare a function in C, then its prototype is assumed to return an int and take an undefined number of parameters. The function omp_set_num_threads is therefore implicitly declared as:

int omp_set_num_threads()

instead of:

 void omp_set_num_threads(int );

As not declaring a function is not strictly an error, compilers usually don't raise the issue if they are not explicitly told to do so. Therefore if you compile with:

gcc foo.c -fopenmp -o foo

this will go unnoticed. To avoid this kind of pitfalls it is thus generally recommended to use the maximum warning level available from the compiler:

gcc foo.c -fopenmp -Wall -Werror -pedantic -o foo

6 Comments

good! If I compile it in cpp and using C99 standard all goes fine, finally I compile it in c using private(i,j) after the #pragma omp parallel (I couldn't use C99 in c, a lot of errors)
if I use for(int i=0...) in C I get errors, I use the declaration int i,j,k; and then use the private clause
For the program logic your solution is completely fine. The only thing is that you have global variables which are used as local indexes (which is not considered to be a good programming practice). To activate C99 you must use the flag -std=c99. The full compilation command is gcc foo.c -fopenmp -std=c99 -Wall -Werror -pedantic -o foo
Good call on the race condition on i. The code did not work for me in Visual Studio until I made j private as well.
@HristoIliev, claims it's unnecessary to make i and j private but this code only works if you do so I don't understand.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.