How to enable OpenMP without -fopenmp flag?

Question

On some Online Judge platforms, codes are compile and run in such way:

g++ -O3 -std=c++17 a.cpp
./a.out

I want to use OpenMp to parallelize my code, so is it possible to enable OpenMP without -fopenmp flag?

#include <iostream>

using namespace std;

int main() {
#pragma omp parallel for
    for (int i = 0; i < 100; i++) {
        std::cout << i << std::endl;
    }
}

I've tried added this to source code, but it doesn't work:

#pragma GCC optimize("openmp")

update: The code above is a terrible example, here's another program that can be accelerated by parallelization:

#include <iostream>
#include <cmath>
#include <chrono>

int main() {
    auto start = std::chrono::steady_clock::now();;
    double sum = 0;
    int n = 2e9;
    //std::cin >> n;
#pragma omp parallel for reduction(+:sum)
    for (int i = 0; i < n; i++) {
        sum += sqrt(i);
    }
    auto end = std::chrono::steady_clock::now();
    std::cout << sum << std::endl;
    std::cout << n << std::endl;
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << std::endl;
}

What do you expect to hear, other than: You can't? Forget about OpenMP, use the parallel algorithms which are part of STL since C++17. — Ext3h
– Ext3h, Commented Oct 15, 2020 at 16:08
cout<< is a pretty terrible example of something you'd want to parallelize! To preserve the semantics of the program (produce the same output), every cout << operation has to happen in source order. (And will have to do locking to control access to the I/O buffer for that stream anyway, which a purely single-threaded process might have avoided). GCC wouldn't auto-parallelize this even if you did use -fopenmp. — Peter Cordes
– Peter Cordes, Commented Oct 16, 2020 at 4:11

Oussama Ben Ghorbel · Accepted Answer · 2020-10-16 09:12:44Z

6

Why are you parallelizing your code? Online Judge problems are supposed to be solved sequentially. The challenge of the problem is to reduce the run time complexity. Online Judge should not allow this cause simply that defeats the purpose of competitive programming.

Furthermore, if you think a parallel algorithm will run faster than a sequential one for N=100, you are displaying a remarkable ignorance about parallelization overhead. Parallelizing algorithms come with huge synchronization costs that will halt performance.

Stream I/O like this can't parallelize; it has to run in source order to produce the same output. And cout::operator<< is thread-safe so it has to do locking anyway to protect the stream buffer.

One way to run faster your loop is to turn off synchronization of iostream with C stdio buffers, and don't flush cout after every line:

std::ios_base::sync_with_stdio(false);
for (int i = 0; i < 100; i++) {
    std::cout << i << '\n';
}

One more thing, std::endl is equivalent to outputting a '\n and an explicit flush of the stream (making a system call). The online judge program presumably runs it with output redirected to a file or pipe, so cout will be full-buffered, not automatically flushing on every newline. A large fraction of the cost of running this program (other than dynamic linker startup) is the system calls to do I/O, so forcing a flush on every line could make it 100x slower.

Even faster might be to prepare a single long string in a buffer to avoid the overhead of locking / unlocking cout twice for every line, but that's minor. Normally the cost of I/O isn't a major factor in competitive programming.

edited Oct 16, 2020 at 9:12

answered Oct 15, 2020 at 16:45

Oussama Ben Ghorbel

2,1494 gold badges20 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Apple Over a year ago

Thanks for your answer!

Oussama Ben Ghorbel Over a year ago

Does this answer your question?

Peter Cordes Over a year ago

And equally / more importantly, use '\n' instead of std::endl. Forcing an actual flush (write system call) on every line is deadly for performance vs. full-buffered output (which will be the case for stdout when writing to a file or pipe). Most of the cost of the whole program will be the system calls, so (not counting dynamic linker startup overhead) it's 100x worse. Choosing the name std::endl for that functionality (newline and flush) was a really bad design decision.

Apple Over a year ago

Yes, I should consider time complexity first, but I’m curious about if there’s a way to optimize a brute force solution with parallelization.

Oussama Ben Ghorbel Over a year ago

@PeterCordes Yes, if you want to, please go ahead edit my answer and add this. Thank you.

Collectives™ on Stack Overflow

How to enable OpenMP without -fopenmp flag?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related