0

On some Online Judge platforms, codes are compile and run in such way:

g++ -O3 -std=c++17 a.cpp
./a.out

I want to use OpenMp to parallelize my code, so is it possible to enable OpenMP without -fopenmp flag?

#include <iostream>

using namespace std;

int main() {
#pragma omp parallel for
    for (int i = 0; i < 100; i++) {
        std::cout << i << std::endl;
    }
}

I've tried added this to source code, but it doesn't work:

#pragma GCC optimize("openmp")

update: The code above is a terrible example, here's another program that can be accelerated by parallelization:

#include <iostream>
#include <cmath>
#include <chrono>

int main() {
    auto start = std::chrono::steady_clock::now();;
    double sum = 0;
    int n = 2e9;
    //std::cin >> n;
#pragma omp parallel for reduction(+:sum)
    for (int i = 0; i < n; i++) {
        sum += sqrt(i);
    }
    auto end = std::chrono::steady_clock::now();
    std::cout << sum << std::endl;
    std::cout << n << std::endl;
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << std::endl;
}
6
  • 3
    xy problem. Why do you think you need this? Commented Oct 15, 2020 at 15:25
  • Thanks for the comment, I‘ve updated the question. Commented Oct 15, 2020 at 15:39
  • What do you expect to hear, other than: You can't? Forget about OpenMP, use the parallel algorithms which are part of STL since C++17. Commented Oct 15, 2020 at 16:08
  • Thanks for your advice! Commented Oct 15, 2020 at 16:30
  • cout<< is a pretty terrible example of something you'd want to parallelize! To preserve the semantics of the program (produce the same output), every cout << operation has to happen in source order. (And will have to do locking to control access to the I/O buffer for that stream anyway, which a purely single-threaded process might have avoided). GCC wouldn't auto-parallelize this even if you did use -fopenmp. Commented Oct 16, 2020 at 4:11

1 Answer 1

6

Why are you parallelizing your code? Online Judge problems are supposed to be solved sequentially. The challenge of the problem is to reduce the run time complexity. Online Judge should not allow this cause simply that defeats the purpose of competitive programming.

Furthermore, if you think a parallel algorithm will run faster than a sequential one for N=100, you are displaying a remarkable ignorance about parallelization overhead. Parallelizing algorithms come with huge synchronization costs that will halt performance.

Stream I/O like this can't parallelize; it has to run in source order to produce the same output. And cout::operator<< is thread-safe so it has to do locking anyway to protect the stream buffer.

One way to run faster your loop is to turn off synchronization of iostream with C stdio buffers, and don't flush cout after every line:

std::ios_base::sync_with_stdio(false);
for (int i = 0; i < 100; i++) {
    std::cout << i << '\n';
}

One more thing, std::endl is equivalent to outputting a '\n and an explicit flush of the stream (making a system call). The online judge program presumably runs it with output redirected to a file or pipe, so cout will be full-buffered, not automatically flushing on every newline. A large fraction of the cost of running this program (other than dynamic linker startup) is the system calls to do I/O, so forcing a flush on every line could make it 100x slower.

Even faster might be to prepare a single long string in a buffer to avoid the overhead of locking / unlocking cout twice for every line, but that's minor. Normally the cost of I/O isn't a major factor in competitive programming.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for your answer!
Does this answer your question?
And equally / more importantly, use '\n' instead of std::endl. Forcing an actual flush (write system call) on every line is deadly for performance vs. full-buffered output (which will be the case for stdout when writing to a file or pipe). Most of the cost of the whole program will be the system calls, so (not counting dynamic linker startup overhead) it's 100x worse. Choosing the name std::endl for that functionality (newline and flush) was a really bad design decision.
Yes, I should consider time complexity first, but I’m curious about if there’s a way to optimize a brute force solution with parallelization.
@PeterCordes Yes, if you want to, please go ahead edit my answer and add this. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.