0

I'm learning OpenMP and have some problems: Parallel program slower than serial, i'm confusing (1 thread vs 2 threads) My code:

#include <iostream>
#include <omp.h>
using namespace std;

int main()
{       
    int threadsNumber=1;
    int S=0;

    cout << "Enter number of threads:\n";
    cin >> threadsNumber;

    double start, end, calculationTime;
    omp_set_num_threads(threadsNumber);
    start = omp_get_wtime();

    #pragma omp parallel for reduction(+: S)
    for(int i=1;i<1000;i++) {
        S+= 10;
    }
    #pragma omp end parallel

    end = omp_get_wtime();

    calculationTime = end - start;

    cout << "Время выполнения: " << calculationTime << "\n";
    cout<<"S = "<< S <<"\n";

    return 0;
}

Results: 1 thread: 2.59876e-05 2 threads: 0.000102043

Where my mistake? Thank you!

3
  • I'm not 100% sure, but the reduction at the end may consume more time then you save using 2 threads. Perhaps a more complex calculation would give you better results Commented Nov 23, 2013 at 20:53
  • 3
    try 1000000000 instead of 1000. It should return 2 seconds for 1 thread and 1 second for 2 threads. If compiler optimizations are enabled then use volatile variables to prevent optimizing out the loop. Commented Nov 23, 2013 at 20:57
  • 1
    One minor thing to add: there's no "#pragma omp end parallel". Since you're writing C++ code, the end of the parallel region is automatically determined by the end of the structural block. Commented Nov 24, 2013 at 19:44

1 Answer 1

2

As J.F Sebastian pointed out in a comment, you don't get much benefit from parallelization because your loop with 1000 iterations is rather quick. That means the overhead it takes to create a 2nd thread is larger than what you save due to parallelization. When you increase the number of loop iterations and thus give the threads more to do, the benefit of multi-threading becomes more apparent.

Sign up to request clarification or add additional context in comments.

2 Comments

I don't know which C++ compiler the OP uses but if it is Intel C++ Compiler or GCC with optimisation level O2 or higher, the loop is replaced with S = max(high-low, 0) * 10; (where low and high are the bounds of the iteration range), which only makes the OpenMP version even slower in comparison.
@HristoIliev: as I've mentioned a volatile variable would prevent the optimization.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.