Parallel Algorithm Analysis: Loops

Question

I have come across what seems to be a collapsed loop.

parallel-for i, j = 1...n

What would be the work and span/depth/critical path length be of loops like this? Would it be the same as that of the following:

parallel-for i = 1...n
  parallel-for j = 1...n

EDIT:

Work = the asymptotic running time of the algorithm on a single core (total computational steps)

Span/depth/critical path length = the asymptotic running time of the algorithm on an infinite number of cores (the point beyond which more parallel processors won't give any performance gains).

Without knowing what language/formalism this is, my guess is that it's emphasising that every execution of the body is independent. Can you give a little more detail about where this comes from? — Pseudonym
– Pseudonym ♦, Commented Jun 15, 2023 at 1:46
It's a step in a matrix algorithm parallel-for i, j, k = 1 to n do C[i, j, k] = A[i, k] * B[k, j] — jon doyle
– jon doyle, Commented Jun 15, 2023 at 2:26
Please define the terms work, span, depth, critical path length in the frame of your question. — user16034
– user16034, Commented Jun 15, 2023 at 16:37
Added. I took those as standard terms because I've seen them used without definition in parallel programming resources. — jon doyle
– jon doyle, Commented Jun 17, 2023 at 1:58

Gaslight Deceive Subvert · Accepted Answer · 2023-11-12 18:48:40Z

The work $W(n)$ is the total number of nodes in your computation graph and the span $D(n)$ is the number of nodes on the longest path of that graph. $T_P(n)$ is the runtime of the algorithm using $P$ processors; $T_1(n) = W(n)$ and $T_\inf(n) = D(n)$.

Suppose your code has the following shape

parfor i in range(n):
    parfor j in range(n):
        f()

where $f() \in O(1)$ is some function (with side-effects). The keyword "parfor" means that every loop iteration is independent. There are $n^2$ independent iterations so in theory $W(n)\in O(n^2)$ and $T_\inf(n) = D(n)\in O(1)$.

However, the work of starting each $f()$ thread and waiting for its result must be accounted for. That is, we must implement the "parfor" keyword. Here is one way:

parfor_impl(f, lo, hi):
    if hi - lo <= 1:
        f()
    else:
        mid = (lo + hi) // 2
        spawn parfor_impl(f, lo, mid)
        spawn parfor_impl(f, mid, hi)
        wait

where the keywords "spawn" and "wait" launches threads and waits for previously launched threads' completion respectively. This is a classical recursive scheme with $W_{parfor} \in O(n)$ and $D_{parfor} \in O(\log n)$. To get the total runtime (and span) we just add the runtime of the two nested loops $O(2\log n) = O(\log n)$. We do not multiply the runtime since the loops run in parallel.

I can highly recommend the "Intro to the Work-Span Model" chapter of Udacity's High Performance Computing course which delves deep into this topic.

Stack Exchange Network

Parallel Algorithm Analysis: Loops

1 Answer 1

Your Answer

Hot Network Questions

Parallel Algorithm Analysis: Loops

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions