Parallel loop in Fortran using OpenMP

Question

I'm exploring OpenMP in Fortran, using the Mandelbrot algorithm as an example:

!$omp parallel do reduction(+:iters)
do iy=0,30
    do ix=0,70
        c = cmplx(xmin+stepx*ix, ymin+stepy*iy, qp)
        z = 0
        escaped = .false.
        do k=1,max_iters
            z = z**2 + c
            escaped = abs(z) > 2
            if (escaped) exit
         end do
         iters(iy, ix) = k      
    end do
end do
!$omp end parallel do

Each iteration is fully independent, so iters(iy, ix) is fully defined for a particular iy, ix pair, a perfect parallel case.

I've experimented a bit with the $omp parallel do directive and reduction(+:iters) is doing the trick.

But to be honest I don't fully understand if that's the best way to do it, since I'm not summing anything for the (+:iters).

In this case, is reduction (+:iters) the correct directive?

The variables c, z, and escaped should be declared private, and iters can be shared (as noted by other commenters). — PierU
– PierU, Commented Mar 12 at 21:59
For better memory access and cache management, the iters array should be transposed (i.e. iters(ix,iy) rather than iters(iy,ix)) — PierU
– PierU, Commented Mar 12 at 22:01

Nimantha · Accepted Answer · 2025-05-17 17:10:36Z

I can reproduce your behaviour once I write a little driver program - if I remove the reduce I get different answers when comparing a serial run and a threaded one. To resolve it I'm not going to mess around with the default scoping rules which have caused confusion in the comments and other answers - I'm going to use default( none ) and do it properly. The code below works with no reduction - please use default( none ) in the future. Also note I have swapped your x and y loops around in the parallel version, while of small import in this case this results in the efficient order of array accesses for a column major language like Fortran


ijb@ijb-Latitude-5410:~/work/stack$ cat mandb.f90
Program mandb

  Use, Intrinsic :: iso_fortran_env, Only : wp => real64

  !$ Use omp_lib, Only : omp_get_max_threads
  
  Implicit None( Type, External )

  Integer, Parameter :: nx = 70
  Integer, Parameter :: ny = 30

  Integer, Dimension( 0:ny, 0:nx ) :: iters_ser
  Integer, Dimension( 0:ny, 0:nx ) :: iters_par

  Integer, Parameter :: max_iters = 1000

  Real( wp ), Parameter :: xmin = -2.00_wp
  Real( wp ), Parameter :: xmax = +0.25_wp
  Real( wp ), Parameter :: stepx = ( xmax - xmin ) / nx

  Real( wp ), Parameter :: ymin = -1.2_wp
  Real( wp ), Parameter :: ymax = +1.2_wp
  Real( wp ), Parameter :: stepy = ( ymax - ymin ) / ny
  
  Complex( wp ) :: c
  Complex( wp ) :: z

  Integer :: ix, iy
  Integer :: k

  Logical :: escaped

  iters_ser = 0
  do iy=0,30
     do ix=0,70
        c = cmplx(xmin+stepx*ix, ymin+stepy*iy, wp)
        z = 0
        escaped = .false.
        do k=1,max_iters
           z = z**2 + c
           escaped = abs(z) > 2
           if (escaped) exit
        end do
        iters_ser(iy, ix) = k      
     end do
  end do

  !$ Write( *, * ) 'On ', omp_get_max_threads(), ' threads'
  iters_par = 0
  !$omp parallel default( none ) &
  !$omp          shared ( iters_par ) &
  !$omp          private( ix, iy, c, z, k, escaped )
  !$omp do
  do ix=0,70
     do iy=0,30
        c = cmplx(xmin+stepx*ix, ymin+stepy*iy, wp)
        z = 0
        escaped = .false.
        do k=1,max_iters
           z = z**2 + c
           escaped = abs(z) > 2
           if (escaped) exit
        end do
        iters_par(iy, ix) = k      
     end do
  end do
  !$omp end do
  !$omp end parallel

  Write( *, * ) 'Max error: ', Maxval( iters_par - iters_ser )

End Program mandb
ijb@ijb-Latitude-5410:~/work/stack$ gfortran -Wall -Wextra -pedantic -std=f2018 -O -g -fopenmp mandb.f90
ijb@ijb-Latitude-5410:~/work/stack$ export OMP_NUM_THREADS=4
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            4  threads
 Max error:            0

That said, I wouldn't code it like the above:

ijb@ijb-Latitude-5410:~/work/stack$ cat mandb2.f90
Program mandb

  Use, Intrinsic :: iso_fortran_env, Only : wp => real64

  Implicit None( Type, External )

  Integer, Parameter :: nx = 70
  Integer, Parameter :: ny = 30

  Integer, Dimension( 0:ny, 0:nx ) :: iters_ser
  Integer, Dimension( 0:ny, 0:nx ) :: iters_par

  Real( wp ), Parameter :: xmin = -2.00_wp
  Real( wp ), Parameter :: xmax = +0.25_wp

  Real( wp ), Parameter :: ymin = -1.2_wp
  Real( wp ), Parameter :: ymax = +1.2_wp

  Call do_mandelbrot( xmin, xmax, ymin, ymax, iters_ser )

  !$omp parallel default( none ) shared( iters_par )
  Call do_mandelbrot( xmin, xmax, ymin, ymax, iters_par )
  !$omp end parallel

  Write( *, * ) 'Max error: ', Maxval( iters_par - iters_ser )

Contains

  Subroutine do_mandelbrot( xmin, xmax, ymin, ymax, iters )

    Use, Intrinsic :: iso_fortran_env, Only : wp => real64

    !$ Use omp_lib, Only : omp_get_thread_num
    !$ Use omp_lib, Only : omp_get_num_threads
  
    Implicit None( Type, External )

    Real( wp )                  , Intent( In    ) :: xmin, xmax
    Real( wp )                  , Intent( In    ) :: ymin, ymax
    Integer, Dimension( 0:, 0: ), Intent(   Out ) :: iters

    Integer, Parameter :: max_iters = 1000

    Complex( wp ) :: c, z
    
    Real( wp ) :: stepx, stepy
    
    Integer :: nx, ny
    Integer :: ix, iy
    Integer :: k
    
    Logical :: escaped

    ny = Ubound( iters, Dim = 1 )
    nx = Ubound( iters, Dim = 2 )

    stepy = ( ymax - ymin ) / ny
    stepx = ( xmax - xmin ) / nx

    !$ If( omp_get_thread_num() == 0 ) Then
    !$    Write( *, * ) 'On ', omp_get_num_threads(), ' threads'
    !$ End If
    
    !$omp do collapse( 2 ) schedule( dynamic )
    do ix=0,nx
       do iy=0,ny
          c = cmplx(xmin+stepx*ix, ymin+stepy*iy, wp)
          z = 0
          escaped = .false.
          iters( iy, ix ) = 0
          do k=1,max_iters
             z = z**2 + c
             escaped = abs(z) > 2
             if (escaped) exit
          end do
          iters(iy, ix) = k      
       end do
    end do
    !$omp end do

  End Subroutine do_mandelbrot
    
End Program mandb
ijb@ijb-Latitude-5410:~/work/stack$ gfortran -Wall -Wextra -pedantic -std=f2018 -O -g -fopenmp mandb2.f90
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$ ./a.out
 On            1  threads
 On            4  threads
 Max error:            0
ijb@ijb-Latitude-5410:~/work/stack$

@VladimirFГероямслава Not using default( none ). Don't care beyond that.

francescalus · Accepted Answer · 2025-03-12 18:48:56Z

2

Mandelbrot is an example for a problem with a huge variation in compute cost across the problem domain. With static schedule, you will have a very bad load balance and therefore a bad scalability. To improve the load balance, dynamic schedule (schedule(dynamic)) and probably collapse(2) will help.

In your code, each element in iters is only touched once during the whole parallel region. Thus there is no chance of a data race and no need to aggregate private copies of the array.

!$omp parallel do shared(iters) schedule(dynamic) collapse(2)
do iy=0,30
    do ix=0,70
        c = cmplx(xmin+stepx*ix, ymin+stepy*iy, qp)
        z = 0
        escaped = .false.
        do k=1,max_iters
            z = z**2 + c
            escaped = abs(z) > 2
            if (escaped) exit
         end do
         iters(iy, ix) = k      
    end do
end do
!$omp end parallel do

Similar to the common practice of using implicit none for declarations in Fortran, I'd consider using default(none) for OpenMP directives good practice. This way you are forced to think about the appropriate scoping of all variables.

edited Mar 12 at 18:48

francescalus

32.9k28 gold badges86 silver badges128 bronze badges

answered Mar 12 at 18:02

Joachim

1,3631 gold badge8 silver badges14 bronze badges

3 Comments

Joachim Mar 12 at 18:43

@francescalus, yes thanks. I fixed my answer

Jérôme Richard Mar 12 at 19:23

Another solution is to use OpenMP tasks but dynamic schedule might be a bit more efficient regarding the target implementation.

Vladimir F Героям слава Mar 12 at 20:52

Probably will be more efficient, but I would still probably go for larger chunks or for guided.

Victor Eijkhout · Accepted Answer · 2025-03-13 00:40:04Z

0

Your only parallel loop is the outer over iy. You're not reducing over that variable: each iy iteration writes in its own iters(iy,...) location. However you have a problem with the ix loop: each thread access the ix variable, so your results will likely be incorrect. Add a private(ix) directive to the parallel loop.

To summarize the comments: the above would be true for C/C++ but not Fortran, as is the case here. All loop variables are private by default in Fortran.

edited Mar 13 at 0:40

answered Mar 12 at 13:27

Victor Eijkhout

6,0002 gold badges29 silver badges31 bronze badges

10 Comments

Ian Bush Mar 12 at 14:09

While I agree scoping is important I read openmp.org/spec-html/5.2/openmpsu33.html (on my phone) is that by default ix should be private. But it's complicated and I could be wrong. Just another reason you should always use default(none)

Joachim Mar 12 at 17:48

It's confusing for C/C++ programmers, but there is a special rule for Fortran "Loop iteration variables inside parallel or task generating constructs are private in the innermost such construct that encloses the loop." So, @IanBush is right: ix and even k (that's where I actually had to revisit the spec text) are indeed private by default. The reasoning is that in Fortran you cannot make them private by restricting their scope to the parallel region since you need to declare them in the declarations block at the beginning of the function.

Victor Eijkhout Mar 12 at 17:50

@IanBush Are you referring to "Loop iteration variables inside parallel, teams, or task generating constructs are private in the innermost such construct that encloses the loop." I guess the interpretation is that the inner loop over "ix" is not enclosed by an OMP construct, therefore the loop variable is not private-as-loop-variable, but shared-as-OMP-default. Also, scoping is harder in Fortran, unless you start using the Block statement that not many people are aware of yet.

Victor Eijkhout Mar 12 at 17:52

@Joachim Our comments overlapped. Yes, you can restrict a scope in Fortran. And I'll have to investigate whether those inner variables are indeed private-by-exception.

Joachim Mar 12 at 18:00

Is limiting the scope of variables possible with Fortran 90?

|

Collectives™ on Stack Overflow

Parallel loop in Fortran using OpenMP

3 Answers 3

2 Comments

3 Comments

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related