Parallelization with OpenMP

Question

I have this code:

 for (i=1; i<=imax-1; i++) {
        for (j=1; j<=jmax; j++) {
            /* only if both adjacent cells are fluid cells */
            if ((flag[i][j] & C_F) && (flag[i+1][j] & C_F)) {
                du2dx = ((u[i][j]+u[i+1][j])*(u[i][j]+u[i+1][j])+
                    gamma*fabs(u[i][j]+u[i+1][j])*(u[i][j]-u[i+1][j])-
                    (u[i-1][j]+u[i][j])*(u[i-1][j]+u[i][j])-
                    gamma*fabs(u[i-1][j]+u[i][j])*(u[i-1][j]-u[i][j]))
                    /(4.0*delx);
                duvdy = ((v[i][j]+v[i+1][j])*(u[i][j]+u[i][j+1])+
                    gamma*fabs(v[i][j]+v[i+1][j])*(u[i][j]-u[i][j+1])-
                    (v[i][j-1]+v[i+1][j-1])*(u[i][j-1]+u[i][j])-
                    gamma*fabs(v[i][j-1]+v[i+1][j-1])*(u[i][j-1]-u[i][j]))
                    /(4.0*dely);
                laplu = (u[i+1][j]-2.0*u[i][j]+u[i-1][j])/delx/delx+
                    (u[i][j+1]-2.0*u[i][j]+u[i][j-1])/dely/dely;

                f[i][j] = u[i][j]+del_t*(laplu/Re-du2dx-duvdy);
            }
            else {
                f[i][j] = u[i][j];
            }
        }
    }

and I want to create 4 threads. One to compute du2dx, duvdy, laplu and f[i][j]. Is there a way to do that with OpenMP?

You should really set a variable equal to u[i][j], would save a lot of ink. Also shouldnt it be j < jmax? You are trying to access index j sometimes. — Antonin GAVREL
– Antonin GAVREL, Commented Mar 20, 2021 at 4:36

Jim Cownie · Accepted Answer · 2021-03-23 08:58:12Z

4

You are right, @Andreas. By default in most implementations, omp parallel will create as many threads as are required to exploit the available parallelism in the machine. omp for will then distribute entire loop iterations to the threads (exactly how will depend on the schedule chosen). So they will not split individual statements from inside the loop and execute each as a separate parallel task.

But, OTOH, "Why do you want to limit yourself to four threads?" Is this code so unimportant that

No one else will ever use it?
You will throw it away before you get a better machine?

Note that what you say you want to do (execute each statement in a separate thread) makes no sense, since there is a dependency between the statements. The final statement

f[i][j] = u[i][j]+del_t*(laplu/Re-du2dx-duvdy);

depends on all of the preceding ones.

So, your best bet is to go with a simple OpenMP parallelisation, though probably something like this will work best

#pragma omp parallel for collapse(2), \
     schedule(nonmonotonic:dynamic),\
     shared(flag,u,gamma,Re,imax,jmax),\
     private(i,j,du2dx,duvdy,laplu)
for (i=1; i<=imax-1; i++) {
        for (j=1; j<=jmax; j++) {
            /* only if both adjacent cells are fluid cells */
            if ((flag[i][j] & C_F) && (flag[i+1][j] & C_F)) {
                du2dx = ((u[i][j]+u[i+1][j])*(u[i][j]+u[i+1][j])+
                    gamma*fabs(u[i][j]+u[i+1][j])*(u[i][j]-u[i+1][j])-
                    (u[i-1][j]+u[i][j])*(u[i-1][j]+u[i][j])-
                    gamma*fabs(u[i-1][j]+u[i][j])*(u[i-1][j]-u[i][j]))
                    /(4.0*delx);
                duvdy = ((v[i][j]+v[i+1][j])*(u[i][j]+u[i][j+1])+
                    gamma*fabs(v[i][j]+v[i+1][j])*(u[i][j]-u[i][j+1])-
                    (v[i][j-1]+v[i+1][j-1])*(u[i][j-1]+u[i][j])-
                    gamma*fabs(v[i][j-1]+v[i+1][j-1])*(u[i][j-1]-u[i][j]))
                    /(4.0*dely);
                laplu = (u[i+1][j]-2.0*u[i][j]+u[i-1][j])/delx/delx+
                    (u[i][j+1]-2.0*u[i][j]+u[i][j-1])/dely/dely;

                f[i][j] = u[i][j]+del_t*(laplu/Re-du2dx-duvdy);
            }
            else {
                f[i][j] = u[i][j];
            }
        }
    }

edited Mar 23, 2021 at 8:58

answered Mar 22, 2021 at 9:20

Jim Cownie

2,9441 gold badge13 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

dreamcrash Over a year ago

Sorry, I did not get it why are you using "nonmonotonic:dynamic" there?

Jim Cownie Over a year ago

dynamic because it is the iterations which are being distributed, but many of them will only do a single assignment (the else in the if), so you want a dynamic distribution to balance the work. nonmonotonic:dynamic because you don't need monotonicity, and it has a lower overhead implementation than a monotonic:dynamic schedule. (Yes, I know that in OpenMP 5.x and later dynamic may be implemented as nonmonotonic:dynamic, but many implementations won't because it might break old code. It's therefore better to be explicit.) A chunk size might help too.

Antonin GAVREL · Accepted Answer · 2021-03-24 13:10:35Z

1

You can use

#pragma omp parallel for private(i)

Right before your initial for, as stated in this tutorial

NB: There are other very cool features, you might want to check.

edited Mar 24, 2021 at 13:10

answered Mar 20, 2021 at 4:40

Antonin GAVREL

11.4k12 gold badges62 silver badges92 bronze badges

5 Comments

Antreas Petrou Over a year ago

isn't "pragma omp for" going to create threads equal to the number of cores, N, and then divide (imax-1) over the number of cores. Then each thread is going to execute the for loop (imax-1)/N times? Please correct me if I am wrong.

Antonin GAVREL Over a year ago

No the thread will be assigned to compute different variable inside your loop.

Antreas Petrou Over a year ago

I may confused "pragma omp parallel" with "pragma omp for". Thanks for the help.

dreamcrash Over a year ago

@AntreasPetrou should you use #pragma parallel omp for private(i) otherwise this code will run sequentially

Jim Cownie Over a year ago

There is no #pragma parallel omp for directive. I think you're looking for #pragma omp parallel for. (All OpenMP directives start with #pragma omp, so this shouldn;t be too hard!)

Collectives™ on Stack Overflow

Parallelization with OpenMP

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related