0

The below is a slightly altered code snippet I am working on for my project and I am having a strange parallel problem with the test1,2,3 routines in which the numbers are sometimes wrong:

  integer, parameter :: N=6
  integer, parameter :: chunk_size=3
  integer, dimension(1:N) :: a,b,c

contains

  subroutine array_setup
    implicit none
    integer :: i

    do i=1,N
       a(i)=2*i
       b(i)=i*i
       c(i)=i*i-i+2
    end do

    return
  end subroutine array_setup

  subroutine test1
    implicit none
    integer :: i

    !$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
    do i=2,N
       a(i-1)=b(i)
       c(i)=a(i)
    end do
    !$OMP end parallel do

    return
  end subroutine test1

  subroutine test2
    implicit none
    integer :: i

    !$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
    do i=2,N
       a(i-1)=b(i)
       a(i)=c(i)
    end do
    !$OMP end parallel do

    return
  end subroutine test2

  subroutine test3
    implicit none
    integer :: i

    !$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
    do i=2,N
       b(i)=a(i-1)
       a(i)=c(i)
    end do
    !$OMP end parallel do

    return
  end subroutine test3

end program vectorize_test

Below is a sample of the output when OMP_NUM_THREADS=1 which is correct:

 after setup
           1           2           1           2
           2           4           4           4
           3           6           9           8
           4           8          16          14
           5          10          25          22
           6          12          36          32
 after test1
           1           4           1           2
           2           9           4           4
           3          16           9           6
           4          25          16           8
           5          36          25          10
           6          12          36          12
 after test2
           1           4           1           2
           2           9           4           4
           3          16           9           8
           4          25          16          14
           5          36          25          22
           6          32          36          32
 after test3
           1           2           1           2
           2           4           2           4
           3           8           4           8
           4          14           8          14
           5          22          14          22
           6          32          22          32

However, when I increase the thread count to above 1, I get strange numbers changing in each of the columns making the output incorrect, where am I going wrong with this, and what can I do to fix it?

1
  • Welcome. Be sure to take the tour and read How to Ask. Pleas use fortran for all Fortran questions. You can add a specific version tag to a version specific questio, but usually you do not want to limit yourself to the old and obsolete Fortran 90 in year 2017. Your code is not Fortran version specific at all. Commented Dec 5, 2017 at 21:45

1 Answer 1

2

When you do

!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   a(i-1)=b(i)
   c(i)=a(i)
end do
!$OMP end parallel do

you can have one thread reading value of a(i) which wasn't computed yet because it is scheduled for some other thread. The loop iterations are dependent on the previous one. You can't parallelize it it this way. You can also have one thread reading the same a(i) location which some other thread is just writing. That is also an error (race condition).

In loop

!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   a(i-1)=b(i)
   a(i)=c(i)
end do
!$OMP end parallel do

the iterations are also not independent. Note that most of the locattions of a(i) will get overwritten in the next iteration. Again two threads may clash in the order these two operations should be done. Yiu can safely rewrite this as

a(1) = b(2)
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   a(i)=c(i)
end do
!$OMP end parallel do

The third loop

!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   b(i)=a(i-1)
   a(i)=c(i)
end do
!$OMP end parallel do

has the same problem as the first loop. Each iteration depends on the previous iteration's value. This cannot be easily parallelized. You must find a way how to rewrite the algorithm so that the iterations do not depend on each other.

Note that there is no nead for the return in each subroutine. You also don't need implicit none in each subroutine if you have it in the parent scope.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.