11

I've been doing some of the LeetCode problems, and I notice that the C solutions are a couple of times faster than the exact same thing in C++. For example:

Updated with a couple of simpler examples:

Given a sorted array and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. You may assume no duplicates in the array. (Link to question on LeetCode)

My solution in C, runs in 3 ms:

int searchInsert(int A[], int n, int target) {
    int left = 0;
    int right = n;
    int mid = 0;
    while (left<right) {
        mid = (left + right) / 2;
        if (A[mid]<target) {
            left = mid + 1;
        }
        else if (A[mid]>target) {
            right = mid;
        }
        else {
            return mid;
        }
    }
    return left;
}

My other C++ solution, exactly the same but as a member function of the Solution class runs in 13 ms:

class Solution {
public:
    int searchInsert(int A[], int n, int target) {
        int left = 0;
        int right = n;
        int mid = 0;
        while (left<right) {
            mid = (left + right) / 2;
            if (A[mid]<target) {
                left = mid + 1;
            }
            else if (A[mid]>target) {
                right = mid;
            }
            else {
                return mid;
            }
        }
        return left;
    }
};

Even simpler example:

Reverse the digits of an integer. Return 0 if the result will overflow. (Link to question on LeetCode)

The C version runs in 6 ms:

int reverse(int x) {
    long rev = x % 10;
    x /= 10;
    while (x != 0) {
        rev *= 10L;
        rev += x % 10;
        x /= 10;
        if (rev>(-1U >> 1) || rev < (1 << 31)) {
            return 0;
        }
    }
    return rev;
}

And the C++ version is exactly the same but as a member function of the Solution class, and runs for 19 ms:

class Solution {
public:
    int reverse(int x) {
        long rev = x % 10;
        x /= 10;
        while (x != 0) {
            rev *= 10L;
            rev += x % 10;
            x /= 10;
            if (rev>(-1U >> 1) || rev < (1 << 31)) {
                return 0;
            }
        }
        return rev;
    }
};

I see how there would be considerable overhead from using vector of vector as a 2D array in the original example if the LeetCode testing system doesn't compile the code with optimisation enabled. But the simpler examples above shouldn't suffer that issue because the data structures are pretty raw, especially in the second case where all you have is long or integer arithmetics. That's still slower by a factor of three.

I'm starting to think that there might be something odd happening with the way LeetCode do the benchmarking in general because even in the C version of the integer reversing problem you get a huge bump in running time from just replacing the line if (rev>(-1U >> 1) || rev < (1 << 31)) { with if (rev>INT_MAX || rev < INT_MIN) {

Now, I suppose having to #include<limits.h> might have something to do with that but it seems a bit extreme that this simple change bumps the execution time from just 6 ms to 19 ms.

16
  • 28
    Because you're using a vector of vector, which means each of your columns (or rows) is a separate allocation which may or may not be in a cache friendly location. Commented Apr 8, 2015 at 18:10
  • 6
    Are those times from single runs, or averages over multiple runs? Commented Apr 8, 2015 at 18:10
  • 4
    Are your optimisations on? Some C++ compilers enable range-checking on vectors when it's off, like in debug builds. Commented Apr 8, 2015 at 18:13
  • 7
    What compiler options are you using? Why does the C++ solution get wrapped in a class, are you including the class initialization in your timer? Speaking of which, how are you timing the execution? Commented Apr 8, 2015 at 18:14
  • 4
    Are the allocating schemes different? A single block in C vs. multiple blocks (vectors) in C++. Anyway you should prefer a single block of memory (no vector of vector). Commented Apr 8, 2015 at 18:15

2 Answers 2

43

Lately I've been seeing the vector<vector<int>> suggestion a lot for doing 2d arrays in C++, and I've been pointing out to people why this really isn't a good idea. It's a handy trick to know when slapping together temporary code, but there's (almost) never any reason to ever use it for real code. The right thing to do is to use a class that wraps a contiguous block of memory.

So my first reaction might be to point to this as a possible source for the disparity. However you're also using int** in the C version, which is generally a sign of the exact same problem as vector<vector<int>>.

So instead I decided to just compare the two solutions.

http://coliru.stacked-crooked.com/a/fa8441cc5baa0391

6468424
6588511

That's the time taken by the 'C version' vs the 'C++ version' in nanoseconds.

My results don't show anything like the disparity you describe. Then it occurred to me to check a common mistake people make when benchmarking

http://coliru.stacked-crooked.com/a/e57d791876b9252b

18386695
42400612

Notice that the -O3 flag from the first example has become -O0, which disables optimization.

Conclusion: you're probably comparing unoptimized executables.

C++ supports building rich abstractions that don't require overhead, but eliminating the the overhead does require certain code transformations that play havoc with the 'debuggability' of code.

That means debug builds avoid those transformations and therefore C++ debug builds are often slower than debug builds of C style code because C style code just doesn't use much abstraction. Seeing a 130% slowdown such as the above is not at all surprising when timing, for example, machine code that uses function calls in place of simple store instructions.


Some code really needs optimizations in order to have reasonable performance even for debugging, so compilers often offer a mode that applies some optimizations which don't cause too much trouble for debuggers. Clang and gcc use -O1 for this, and you can see that even this level of optimization essentially eliminates the gap in this program between C style code and the more C++ style code:

http://coliru.stacked-crooked.com/a/13967ebcfcfa4073

8389992
8196935


Update:

In those later examples optimization shouldn't make a difference, since the C++ is not using any abstraction beyond what the C version is doing. I'm guessing that the explanation for this is that the examples are being compiled with different compilers or with some other different compiler options. Without knowing how the compilation is done I would say it makes no sense to compare these runtime numbers; LeetCode is clearly not producing an apples to apples comparison.

Sign up to request clarification or add additional context in comments.

10 Comments

I've often seen std::vector<T> being optimized down to "raw" arrays, but, as you correctly say, you need to turn optimizations on. With C you're already using that kind of arrays, so unoptimized code will be faster.
@bames53 Actually, it's not always the case that vector<vector<T>> is worse than a custom struct. This depends a lot on the application. (E.g. adding another column is an extremely costly thing for the struct, but not for the nested vector). As always, one should profile the actual use case.
On a side note, gcc use -Og for debugger-friendly optimizations.
@black: You've seen vector optimized into a raw array? That shouldn't be possible, unless the optimizer is able to turn a heap allocation into a stack-based one, which is far beyond the optimization abilities of any compiler I've ever seen.
Pointers to a heap allocation are often referred to as raw arrays, because array and pointer syntax are interchangeable in C. Never heard that phrase to mean stack arrays specifically.
|
-4

You are using vector of vector in your C++ code snippet. Vectors are sequence containers in C++ that are like arrays that can change in size. Instead of vector<vector<int>> if you use statically allocated arrays, that would be better. You may use your own Array class as well with operator [] overloaded, but vector has more overhead as it dynamically resizes when you add more elements than its original size. In C++, you use call by reference to further reduce your time if you compare that with C. C++ should run even faster if written well.

5 Comments

I don't see how this answers the question. Is the C++ code changing the size of the vectors? Of not, then why should that contribute to the running time? The vector is already being passed by reference. Although you've made true statements, I don't quite see how they're relevant to the question at hand.
There is absolutely no reason why C++ should be faster (or slower) than C, having this simple data structure.
Adding to @DieterLücking: C++'s original "compiler" output was simply C code. All it did was transform (then) "C++" into C code to be fed to a C compiler. No reason why C++ would be slower. Its original name (I think) was "C with classes".
@RobKennedy Do you know why this question was downvoted? Robinson's comment and bames53's answer said essentially the same thing.
I don't know why you're asking me, @Nick. Robinson's comment suggests the problem is cache locality. Bames's answer says the problem is optimization. Those aren't the same things.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.