1

Recent OpenMP standards introduced task multi-dependencies. This feature was originally proposed in OmpSs if I'm not mistaken. I found an example of applying multi-dependencies to the assembly step of a finite element method.

For educational purposes I'm trying to replicate the assembly example in 1-d using dense storage for simplicity:

  integer, parameter :: N = 10
  real(dp), parameter :: L = 1.0_dp

  real(dp), dimension(N+1) :: nodes
  real(dp), dimension(2,2) :: Ke, Me, A_local
  real(dp), dimension(N+1,N+1) :: A
  real(dp) :: h

  h = L / real(N, dp)

  Ke = D / h * reshape([1.0_dp, -1.0_dp, -1.0_dp, 1.0_dp], [2,2])
  Me = sigma * h / 6.0_dp * reshape([2.0_dp, 1.0_dp, 1.0_dp, 2.0_dp], [2,2])
  A = 0.0_dp

  ! Loop over elements
  do i = 1, N
    A_local = Ke + Me
    A(i:i+1, i:i+1) = A(i:i+1, i:i+1) + A_local
  end do

The conventional approach to parallelise this might use atomic updates or a critical section to avoid data races, e.g.

  ! Loop over elements
  !$omp parallel do default(private) shared(A,N,Ke,Me)
  do i = 1, N
    A_local = Ke + Me
    !$omp critical
    A(i:i+1, i:i+1) = A(i:i+1, i:i+1) + A_local
    !$omp end critical
  end do

I would now like to try achieve the same thing using multi-dependencies instead of critical. I started with the following task loop:

  !$omp parallel default(shared) private(A_local)
  !$omp single
  do i = 1, N
    !$omp task depend(mutexinoutset: A(i:i+1,i:i+1)) firstprivate(i)
    A_local = Ke + Me
    A(i:i+1, i:i+1) = A(i:i+1, i:i+1) + A_local
    !$omp end task
  end do
  !$omp end single
  !$omp end parallel

This does not work. As explained here, the storage locations in the depend clause are not allowed to be disjoint.

My next idea was to use an iterator,

!$omp task depend(iterator(it=i:i+1), mutexinoutset: A(it,it)) firstprivate(i)

which captures dependencies on the diagonal matrix elements. This happens to be correct in case of 1-d finite elements, but I don't think would be valid in 2-d or 3-d, where the connectivity pattern is more complex.

My third idea was to use a trivial index list to enforce the dependencies:

  integer, dimension(N+1) :: adj

  do i = 1, N
    adj(i) = i
  end do

combined with the clause:

!$omp task depend(iterator(it=i:i+1), mutexinoutset: adj(it)) firstprivate(i)

Again, this happens to work because the neighbour dependencies are simple (each finite element) depends on the neighbouring two nodes. In practice this solution appears to work as expected

The last idea I had was to use a full-blown neighbour/adjacency list, i.e.

integer, dimension(2,N) :: nel

do i = 1, n
   nel(:,i) = [i,i+1]
end do

in combination with the clause,

    !$omp task depend(mutexinoutset: nel(:,i))

again this happens to work because the dependency graph is very simple.

In all cases the body of the task remains the same.

Which of the above-mentioned approaches is the recommended one? Am I using the multidependency feature "correctly"?

2
  • For a 2d rectangle dependence you can write !$omp task depend(iterator(it1=i:i+1,it2=i:i+1), mutexinoutset: A(it1,it2)) firstprivate(i). I don't think that you will gain an performance advantage over using atomic updates of the array elements. OpenMP tasks introduce a heavy overhead that needs significant work (>10**5 instructions) per task to amortize. Commented Apr 25 at 22:17
  • Thanks. It's good to know that iterators can be combined. I'm aware that more work is needed. In one of the PoP case studies, they indeed explain that "chunking" is used, meaning they first compute a coarse partition, and then use multi-dependencies on that. Commented Apr 26 at 0:49

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.