OpenMP parallelization with array elements

Question

I've been playing around with OpenMP, and am trying to see if I can get a speedup in a particular bit of C++ code.

    #pragma omp parallel for
    for (Index j=alignedSize; j<size; ++j)
    {
      res[j] = cj.pmadd(lhs0(j), pfirst(ptmp0), res[j]);
      res[j] = cj.pmadd(lhs1(j), pfirst(ptmp1), res[j]);
      res[j] = cj.pmadd(lhs2(j), pfirst(ptmp2), res[j]);
      res[j] = cj.pmadd(lhs3(j), pfirst(ptmp3), res[j]);
    }

I'm a complete newbie with OpenMP so be gentle with me, but could someone shed some light on why this code ends up doubling the execution time rather than speeding it up?

I'm running with 4 cores, just in case that matters.

How did you measure time? What are your specific results? Can you provide the code in form of a minimal reproducible example? What is the specific processor model and memory setup of the system? — Zulan
– Zulan, Commented Dec 17, 2016 at 20:13

David · Accepted Answer · 2016-12-17 19:08:29Z

2

What is the size of a res entry? If its less than the size of a cache line then its likely false sharing.

answered Dec 17, 2016 at 19:08

David

1,52014 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

alairbyday Over a year ago

A res entry is 8 bytes long, and so assuming a 64 byte long cache line, it looks like I would want to assign 8 iterations per thread? Something like #pragma omp parallel for schedule(static,8) ?

tim18 · Accepted Answer · 2016-12-17 23:51:40Z

0

A bare minimum for typical cpu would be chunks of 128 bytes and then you would need unified last level cache.

answered Dec 17, 2016 at 23:51

tim18

6201 gold badge5 silver badges8 bronze badges

Collectives™ on Stack Overflow

OpenMP parallelization with array elements

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related