0

I have a nested do loop in an openmp fortran 77 code that I am unable to parallelize (the code gives a segmentation fault error when it is run). I have a very similar nested do loop in a different subroutine of the same code that runs parallel with no issues. Here is the nested do loop that I am having problems with:

      do n=1,num_p
C$OMP  PARALLEL DO DEFAULT(SHARED), PRIVATE(l,i1,i2,j1,j2,k1,k2
C$OMP& ,i,j,k,i_t,j_t,i_ddf,j_ddf,ddf_dum)
        do l=1,n_l(n)
          call del_fn(l,n)
          i1=p_iw(l,n)
          i2=p_ie(l,n)
          j1=p_js(l,n)
          j2=p_jn(l,n)
          k1=p_kb(l,n)
          k2=p_kt(l,n)
          do i=i1,i2
            i_ddf=i-i1+1
            if(i .lt. 1) then
              i_t=nx+i
            elseif (i .gt. nx) then
              i_t=i-nx
            else
              i_t=i
            endif
            do j=j1,j2
                j_ddf=j-j1+1
              if(j .lt.1) then
                j_t=ny+j
              elseif(j .gt. ny) then
                j_t=j-ny
              else
                j_t=j
              endif
              do k=k1,k2
                ddf(l,n,i_ddf,j_ddf,k-k1+1) = ddf_dum(i_t,j_t,k)
              enddo
            enddo
          enddo
        enddo
C$OMP END PARALLEL DO
      enddo

I have narrowed the problem down to ddf_dum(i_t,j_t,k). When this term is turned off (say I replace it by 0.d0), the code runs fine.

On the other hand, I have a very similar nested do loop that runs parallel with no issues. Below is that nested do loop that runs parallel with no issues. Can anyone please identify what I am missing here?

      do n=1,1
C$OMP  PARALLEL DO DEFAULT(SHARED), PRIVATE(l,i1,i2,j1,j2,k1,k2
C$OMP& ,i,j,k,i_f,j_f,i_ddf,j_ddf)
        do l=1,n_l(n)
          i1=p_iw(l,n)
          i2=p_ie(l,n)
          j1=p_js(l,n)
          j2=p_jn(l,n)
          k1=p_kb(l,n)
          k2=p_kt(l,n)
          u_forcing(l,n)= (u_p(l,n)-up_tilde(l,n))/dt
          v_forcing(l,n)= (v_p(l,n)-vp_tilde(l,n))/dt
          w_forcing(l,n)= (w_p(l,n)-wp_tilde(l,n))/dt
          do i=i1,i2
            i_ddf=i-i1+1
            if(i .lt. 1) then
              i_f=nx+i
            elseif (i .gt. nx) then
              i_f=i-nx
            else
              i_f=i
            endif
            do j=j1,j2
              j_ddf=j-j1+1
              if(j .lt.1) then
                j_f=ny+j
              elseif(j .gt. ny) then
                j_f=j-ny
              else
                j_f=j
              endif
              do k=k1,k2
                forcing_x(i_f,j_f,k)=forcing_x(i_f,j_f,k)+u_forcing(l,n)
 &                            *ddf_n(l,n,i_ddf,j_ddf,k-k1+1)*dv_l(l,n)
                forcing_y(i_f,j_f,k)=forcing_y(i_f,j_f,k)+v_forcing(l,n)
 &                            *ddf_n(l,n,i_ddf,j_ddf,k-k1+1)*dv_l(l,n)
                forcing_z(i_f,j_f,k)=forcing_z(i_f,j_f,k)+w_forcing(l,n)
 &                            *ddf_n(l,n,i_ddf,j_ddf,k-k1+1)*dv_l(l,n)
              enddo
            enddo
          enddo
        enddo
C$OMP END PARALLEL DO
      enddo
0

1 Answer 1

2

As you noted, your problem is ddf_dum. It should be a shared variable, not private, because it is only being read from and never written to. You are getting a segfault because you are attempting to access uninitialized memory on all the threads that aren't your master thread.

A good rule of thumb that you could have used to find this mistake yourself: all variables that are only found on the RHS of your equal signs within your parallel region should always be shared.

Sign up to request clarification or add additional context in comments.

4 Comments

Reading from uninitialised variables is not technically a problem, unless those are unallocated. Having large arrays end up on the stack instead of on the heap as a result of making them private is.
I have narrowed the problem down. I believe it is coming from calling the function del_fn in parallel (the common variable ddf_dum is updated in the function del_fn). I now call the function del_fn separately to update ddf_dum, and then run the large nested do loop to update ddf. If del_fn is not run in parallel, everything works out. So the question becomes how to run the function del_fn in parallel?
Step 1: Remove all common variables. They're poor programming form and they make it nearly impossible to parallelize code properly. Rewrite them as normal variables and send them to functions as parameters. Step 2: is each iteration independent from others? Without seeing del_fn, we can't answer that question.
Here is the the del_fn subroutine (del_fn). Is there a way that I can keep the subroutine within the nested do loop or do I have to write it inline? The reason I ask is that I have removed the common variables as you suggested, and I am currently trying to write it inline. It is however lengthy, and I am afraid it will take quite some time to debug. It would be much easier if I could keep the subroutine del_fn within the nested do loops.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.