1

I have this code:

 for (i=1; i<=imax-1; i++) {
        for (j=1; j<=jmax; j++) {
            /* only if both adjacent cells are fluid cells */
            if ((flag[i][j] & C_F) && (flag[i+1][j] & C_F)) {
                du2dx = ((u[i][j]+u[i+1][j])*(u[i][j]+u[i+1][j])+
                    gamma*fabs(u[i][j]+u[i+1][j])*(u[i][j]-u[i+1][j])-
                    (u[i-1][j]+u[i][j])*(u[i-1][j]+u[i][j])-
                    gamma*fabs(u[i-1][j]+u[i][j])*(u[i-1][j]-u[i][j]))
                    /(4.0*delx);
                duvdy = ((v[i][j]+v[i+1][j])*(u[i][j]+u[i][j+1])+
                    gamma*fabs(v[i][j]+v[i+1][j])*(u[i][j]-u[i][j+1])-
                    (v[i][j-1]+v[i+1][j-1])*(u[i][j-1]+u[i][j])-
                    gamma*fabs(v[i][j-1]+v[i+1][j-1])*(u[i][j-1]-u[i][j]))
                    /(4.0*dely);
                laplu = (u[i+1][j]-2.0*u[i][j]+u[i-1][j])/delx/delx+
                    (u[i][j+1]-2.0*u[i][j]+u[i][j-1])/dely/dely;

                f[i][j] = u[i][j]+del_t*(laplu/Re-du2dx-duvdy);
            }
            else {
                f[i][j] = u[i][j];
            }
        }
    }

and I want to create 4 threads. One to compute du2dx, duvdy, laplu and f[i][j]. Is there a way to do that with OpenMP?

3
  • You should really set a variable equal to u[i][j], would save a lot of ink. Also shouldnt it be j < jmax? You are trying to access index j sometimes. Commented Mar 20, 2021 at 4:36
  • It should not be j < jmax as the array has (jmax+2) columns Commented Mar 20, 2021 at 4:40
  • Ok, its not very intuitive but you have done the checks Commented Mar 20, 2021 at 4:41

2 Answers 2

4

You are right, @Andreas. By default in most implementations, omp parallel will create as many threads as are required to exploit the available parallelism in the machine. omp for will then distribute entire loop iterations to the threads (exactly how will depend on the schedule chosen). So they will not split individual statements from inside the loop and execute each as a separate parallel task.

But, OTOH, "Why do you want to limit yourself to four threads?" Is this code so unimportant that

  1. No one else will ever use it?
  2. You will throw it away before you get a better machine?

Note that what you say you want to do (execute each statement in a separate thread) makes no sense, since there is a dependency between the statements. The final statement

f[i][j] = u[i][j]+del_t*(laplu/Re-du2dx-duvdy);

depends on all of the preceding ones.

So, your best bet is to go with a simple OpenMP parallelisation, though probably something like this will work best

#pragma omp parallel for collapse(2), \
     schedule(nonmonotonic:dynamic),\
     shared(flag,u,gamma,Re,imax,jmax),\
     private(i,j,du2dx,duvdy,laplu)
for (i=1; i<=imax-1; i++) {
        for (j=1; j<=jmax; j++) {
            /* only if both adjacent cells are fluid cells */
            if ((flag[i][j] & C_F) && (flag[i+1][j] & C_F)) {
                du2dx = ((u[i][j]+u[i+1][j])*(u[i][j]+u[i+1][j])+
                    gamma*fabs(u[i][j]+u[i+1][j])*(u[i][j]-u[i+1][j])-
                    (u[i-1][j]+u[i][j])*(u[i-1][j]+u[i][j])-
                    gamma*fabs(u[i-1][j]+u[i][j])*(u[i-1][j]-u[i][j]))
                    /(4.0*delx);
                duvdy = ((v[i][j]+v[i+1][j])*(u[i][j]+u[i][j+1])+
                    gamma*fabs(v[i][j]+v[i+1][j])*(u[i][j]-u[i][j+1])-
                    (v[i][j-1]+v[i+1][j-1])*(u[i][j-1]+u[i][j])-
                    gamma*fabs(v[i][j-1]+v[i+1][j-1])*(u[i][j-1]-u[i][j]))
                    /(4.0*dely);
                laplu = (u[i+1][j]-2.0*u[i][j]+u[i-1][j])/delx/delx+
                    (u[i][j+1]-2.0*u[i][j]+u[i][j-1])/dely/dely;

                f[i][j] = u[i][j]+del_t*(laplu/Re-du2dx-duvdy);
            }
            else {
                f[i][j] = u[i][j];
            }
        }
    }
Sign up to request clarification or add additional context in comments.

2 Comments

Sorry, I did not get it why are you using "nonmonotonic:dynamic" there?
dynamic because it is the iterations which are being distributed, but many of them will only do a single assignment (the else in the if), so you want a dynamic distribution to balance the work. nonmonotonic:dynamic because you don't need monotonicity, and it has a lower overhead implementation than a monotonic:dynamic schedule. (Yes, I know that in OpenMP 5.x and later dynamic may be implemented as nonmonotonic:dynamic, but many implementations won't because it might break old code. It's therefore better to be explicit.) A chunk size might help too.
1

You can use

#pragma omp parallel for private(i)

Right before your initial for, as stated in this tutorial

NB: There are other very cool features, you might want to check.

5 Comments

isn't "pragma omp for" going to create threads equal to the number of cores, N, and then divide (imax-1) over the number of cores. Then each thread is going to execute the for loop (imax-1)/N times? Please correct me if I am wrong.
No the thread will be assigned to compute different variable inside your loop.
I may confused "pragma omp parallel" with "pragma omp for". Thanks for the help.
@AntreasPetrou should you use #pragma parallel omp for private(i) otherwise this code will run sequentially
There is no #pragma parallel omp for directive. I think you're looking for #pragma omp parallel for. (All OpenMP directives start with #pragma omp, so this shouldn;t be too hard!)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.