parallelize inner loop using openmp

Question

I have three nested loops but only the innermost is parallelizable. The outer and middle loop stop conditions depend on the calculations done by the innermost loop and therefore I cannot change the order.

I have used a OPENMP pragma directive just before the innermost loop but the performance with two threads is worst than with one. I guess it is because the threads are being created every iteration of the outer loops.

Is there any way to create the threads outside the outer loops but just use it in the innermost loop?

Thanks in advance

Please show us the code, or even better, a simplified example that shows the problem. — user180326
– user180326, Commented Feb 5, 2011 at 13:14

ltjax · Accepted Answer · 2011-02-05 13:09:17Z

5

OpenMP should be using a thread-pool, so you won't be recreating threads every time you execute your loop. Strictly speaking, however, that might depend on the OpenMP implementation you are using (I know the GNU compiler uses a pool). I suggest you look for other common problems, such as false sharing.

answered Feb 5, 2011 at 13:09

ltjax

16.1k3 gold badges42 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Hernan Over a year ago

Thanks for all the comments. I will look into my code again. Can anyone suggest a good free profiler/code analyzer for multithreaded code?

minjang · Accepted Answer · 2011-02-05 23:05:38Z

Unfortunately, current multicore computer systems are no good for such fine-grained inner-loop parallelism. It's not because of a thread creation/forking issue. As Itjax pointed out, virtually all OpenMP implementations exploit thread pools, i.e., they pre-create a number of threads, and threads are parked. So, there is actually no overhead of creating threads.

However, the problems of such parallelizing inner loops are the following two overhead:

Dispatching jobs/tasks to threads: even if we don't need to physically create threads, at least we must assign jobs (= create logical tasks) to threads which mostly requires synchronizations.
Joining threads: after all threads in a team, then these threads should be joined (unless nowait OpenMP directive used). This is typically implemented as a barrier operation, which is also very intensive synchronization.

Hence, one should minimize the actual number of thread assigning/joining. You may decrease such overhead by increasing the amount of work of the inner loop per invocation. This could be done by some code changes like loop unrolling.

Collectives™ on Stack Overflow

parallelize inner loop using openmp

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related