Why is my C++ code three times slower than the C equivalent on LeetCode? [closed]

Question

Closed. This question needs debugging details. It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.

Closed 10 years ago.

Improve this question

I've been doing some of the LeetCode problems, and I notice that the C solutions are a couple of times faster than the exact same thing in C++. For example:

Updated with a couple of simpler examples:

Given a sorted array and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. You may assume no duplicates in the array. (Link to question on LeetCode)

My solution in C, runs in 3 ms:

int searchInsert(int A[], int n, int target) {
    int left = 0;
    int right = n;
    int mid = 0;
    while (left<right) {
        mid = (left + right) / 2;
        if (A[mid]<target) {
            left = mid + 1;
        }
        else if (A[mid]>target) {
            right = mid;
        }
        else {
            return mid;
        }
    }
    return left;
}

My other C++ solution, exactly the same but as a member function of the Solution class runs in 13 ms:

class Solution {
public:
    int searchInsert(int A[], int n, int target) {
        int left = 0;
        int right = n;
        int mid = 0;
        while (left<right) {
            mid = (left + right) / 2;
            if (A[mid]<target) {
                left = mid + 1;
            }
            else if (A[mid]>target) {
                right = mid;
            }
            else {
                return mid;
            }
        }
        return left;
    }
};

Even simpler example:

Reverse the digits of an integer. Return 0 if the result will overflow. (Link to question on LeetCode)

The C version runs in 6 ms:

int reverse(int x) {
    long rev = x % 10;
    x /= 10;
    while (x != 0) {
        rev *= 10L;
        rev += x % 10;
        x /= 10;
        if (rev>(-1U >> 1) || rev < (1 << 31)) {
            return 0;
        }
    }
    return rev;
}

And the C++ version is exactly the same but as a member function of the Solution class, and runs for 19 ms:

class Solution {
public:
    int reverse(int x) {
        long rev = x % 10;
        x /= 10;
        while (x != 0) {
            rev *= 10L;
            rev += x % 10;
            x /= 10;
            if (rev>(-1U >> 1) || rev < (1 << 31)) {
                return 0;
            }
        }
        return rev;
    }
};

I see how there would be considerable overhead from using vector of vector as a 2D array in the original example if the LeetCode testing system doesn't compile the code with optimisation enabled. But the simpler examples above shouldn't suffer that issue because the data structures are pretty raw, especially in the second case where all you have is long or integer arithmetics. That's still slower by a factor of three.

I'm starting to think that there might be something odd happening with the way LeetCode do the benchmarking in general because even in the C version of the integer reversing problem you get a huge bump in running time from just replacing the line if (rev>(-1U >> 1) || rev < (1 << 31)) { with if (rev>INT_MAX || rev < INT_MIN) {

Now, I suppose having to #include<limits.h> might have something to do with that but it seems a bit extreme that this simple change bumps the execution time from just 6 ms to 19 ms.

Because you're using a vector of vector, which means each of your columns (or rows) is a separate allocation which may or may not be in a cache friendly location. — Robinson
– Robinson, Commented Apr 8, 2015 at 18:10
Are those times from single runs, or averages over multiple runs? — John Bode
– John Bode, Commented Apr 8, 2015 at 18:10
Are your optimisations on? Some C++ compilers enable range-checking on vectors when it's off, like in debug builds. — molbdnilo
– molbdnilo, Commented Apr 8, 2015 at 18:13
What compiler options are you using? Why does the C++ solution get wrapped in a class, are you including the class initialization in your timer? Speaking of which, how are you timing the execution? — IllusiveBrian
– IllusiveBrian, Commented Apr 8, 2015 at 18:14
Are the allocating schemes different? A single block in C vs. multiple blocks (vectors) in C++. Anyway you should prefer a single block of memory (no vector of vector). — user2249683
– user2249683, Commented Apr 8, 2015 at 18:15

Community · Accepted Answer · 2017-05-23 12:23:47Z

43

Lately I've been seeing the vector<vector<int>> suggestion a lot for doing 2d arrays in C++, and I've been pointing out to people why this really isn't a good idea. It's a handy trick to know when slapping together temporary code, but there's (almost) never any reason to ever use it for real code. The right thing to do is to use a class that wraps a contiguous block of memory.

So my first reaction might be to point to this as a possible source for the disparity. However you're also using int** in the C version, which is generally a sign of the exact same problem as vector<vector<int>>.

So instead I decided to just compare the two solutions.

http://coliru.stacked-crooked.com/a/fa8441cc5baa0391

6468424
6588511

That's the time taken by the 'C version' vs the 'C++ version' in nanoseconds.

My results don't show anything like the disparity you describe. Then it occurred to me to check a common mistake people make when benchmarking

http://coliru.stacked-crooked.com/a/e57d791876b9252b

18386695
42400612

Notice that the -O3 flag from the first example has become -O0, which disables optimization.

Conclusion: you're probably comparing unoptimized executables.

C++ supports building rich abstractions that don't require overhead, but eliminating the the overhead does require certain code transformations that play havoc with the 'debuggability' of code.

That means debug builds avoid those transformations and therefore C++ debug builds are often slower than debug builds of C style code because C style code just doesn't use much abstraction. Seeing a 130% slowdown such as the above is not at all surprising when timing, for example, machine code that uses function calls in place of simple store instructions.

Some code really needs optimizations in order to have reasonable performance even for debugging, so compilers often offer a mode that applies some optimizations which don't cause too much trouble for debuggers. Clang and gcc use -O1 for this, and you can see that even this level of optimization essentially eliminates the gap in this program between C style code and the more C++ style code:

http://coliru.stacked-crooked.com/a/13967ebcfcfa4073

8389992
8196935

Update:

In those later examples optimization shouldn't make a difference, since the C++ is not using any abstraction beyond what the C version is doing. I'm guessing that the explanation for this is that the examples are being compiled with different compilers or with some other different compiler options. Without knowing how the compilation is done I would say it makes no sense to compare these runtime numbers; LeetCode is clearly not producing an apples to apples comparison.

edited May 23, 2017 at 12:23

CommunityBot

11 silver badge

answered Apr 8, 2015 at 19:31

bames53

88.7k15 gold badges191 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

edmz Over a year ago

I've often seen std::vector<T> being optimized down to "raw" arrays, but, as you correctly say, you need to turn optimizations on. With C you're already using that kind of arrays, so unoptimized code will be faster.

stefan Over a year ago

@bames53 Actually, it's not always the case that vector<vector<T>> is worse than a custom struct. This depends a lot on the application. (E.g. adding another column is an extremely costly thing for the struct, but not for the nested vector). As always, one should profile the actual use case.

kiwixz Over a year ago

On a side note, gcc use -Og for debugger-friendly optimizations.

StilesCrisis Over a year ago

@black: You've seen vector optimized into a raw array? That shouldn't be possible, unless the optimizer is able to turn a heap allocation into a stack-based one, which is far beyond the optimization abilities of any compiler I've ever seen.

SilverbackNet Over a year ago

Pointers to a heap allocation are often referred to as raw arrays, because array and pointer syntax are interchangeable in C. Never heard that phrase to mean stack arrays specifically.

|

Mooing Duck · Accepted Answer · 2015-04-08 18:41:51Z

-4

You are using vector of vector in your C++ code snippet. Vectors are sequence containers in C++ that are like arrays that can change in size. Instead of vector<vector<int>> if you use statically allocated arrays, that would be better. You may use your own Array class as well with operator [] overloaded, but vector has more overhead as it dynamically resizes when you add more elements than its original size. In C++, you use call by reference to further reduce your time if you compare that with C. C++ should run even faster if written well.

edited Apr 8, 2015 at 18:41

Mooing Duck

67.6k19 gold badges107 silver badges169 bronze badges

answered Apr 8, 2015 at 18:37

Dr. Debasish Jana

7,1386 gold badges40 silver badges76 bronze badges

5 Comments

Rob Kennedy Over a year ago

I don't see how this answers the question. Is the C++ code changing the size of the vectors? Of not, then why should that contribute to the running time? The vector is already being passed by reference. Although you've made true statements, I don't quite see how they're relevant to the question at hand.

user2249683 Over a year ago

There is absolutely no reason why C++ should be faster (or slower) than C, having this simple data structure.

Cole Tobin Over a year ago

Adding to @DieterLücking: C++'s original "compiler" output was simply C code. All it did was transform (then) "C++" into C code to be fed to a C compiler. No reason why C++ would be slower. Its original name (I think) was "C with classes".

nick Over a year ago

@RobKennedy Do you know why this question was downvoted? Robinson's comment and bames53's answer said essentially the same thing.

Rob Kennedy Over a year ago

I don't know why you're asking me, @Nick. Robinson's comment suggests the problem is cache locality. Bames's answer says the problem is optimization. Those aren't the same things.

Collectives™ on Stack Overflow

Why is my C++ code three times slower than the C equivalent on LeetCode? [closed]

2 Answers 2

10 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

5 Comments

Linked

Related