0

I am trying to speed up my code (minimal, reproducible example below) and I was told that passing by reference would be a more efficient method for my comparator function. That was the first time I heard of the phrase, so I looked it up and found some websites with examples, but I don't understand when and how to use it. How would I use it in this case?

#include <array>
#include <iostream>
#include <algorithm>
#include <fstream>
#include <ctime>
#include <random>

using namespace std;

class arrMember {
public:
    int var;
    arrMember(int var) :
        var(var) {}
    arrMember() {};
};

array<int, 1000000> arraySource;

array<arrMember, 1000000> arrayObjects;

bool compare(arrMember(x), arrMember(y)) {
    return (x.var < y.var);
}

void arrayPrint() {
    ofstream output("arrayPrint.txt");
    for (int k = 0; k != arrayObjects.size(); k++) {
        output << arrayObjects[k].var << endl;
    }

    output.close();
}

void sort() {
    int j = 0;
    for (auto i = arraySource.begin(); i != arraySource.end(); i++, j++) {
        arrayObjects[j] = arrMember(arraySource[j]);
    }

    sort(arrayObjects.begin(), arrayObjects.end(), compare);
}

int main(){
    random_device rd{};
    mt19937 engine{ rd() };
    uniform_int_distribution<int> dist{ 0, 9999 };
    for (int x = 0; x < arraySource.size(); ++x){
        arraySource[x] = dist(engine);
    }

    cout << "Executing sort...\n";
    clock_t t1 = clock();
    sort();
    clock_t t2 = clock();
    double timeDiff = ((double)(t2 - t1)) / CLOCKS_PER_SEC;

    cout << "Sort finished. CPU time was " << timeDiff << " seconds.\n";

    arrayPrint();

    return 0;
}

Thanks for the help.

2 Answers 2

2

For trivially small types, passing by reference doesn't help; copy-constructing a class consisting of a single int is basically the same cost as taking the address of an existing instance, and copy constructing means the comparator doesn't need to dereference a pointer to find the value, it's already on the local stack.

For a larger type with expensive copy-construction, you could change (original code minus unnecessary parens):

bool compare(arrMember x, arrMember y) {
    return x.var < y.var;
}

to:

bool compare(const arrMember& x, const arrMember& y) {
    return x.var < y.var;
}

and harvest a meaningful speed-up, but for a simple int wrapper class, you gain nothing.

What will make a difference, regardless of the size of the class in question, is replacing raw functions with functors (or lambdas, which are syntactic sugar for functors). std::sort specializes the template to the type of the comparator, and functions themselves aren't types; they're effectively instances of the set of functions that share the same prototype. So if you implement both:

bool compare(const arrMember& x, const arrMember& y) {
    return x.var < y.var;
}
bool compareReverse(const arrMember& x, const arrMember& y) {
    return x.var > y.var;
}

and call std::sort with both compare and compareReverse at different points in your program, it only makes one specialization of std::sort for bool (*)(const arrMember&, const arrMember&), and that single specialization must actually pass around and call the provided function through a pointer; the cost of calling a function at all is significantly higher than the trivial cost of performing the comparison itself, and calling through a pointer is usually even more expensive.

By contrast, functors (and lambdas) are unique types, so std::sort can fully specialize to them, including inlining the comparison, so no call to the comparator function ever occurs; the comparator logic is inlined directly into a unique specialization of std::sort. So instead of:

bool compare(const arrMember& x, const arrMember& y) {
    return x.var < y.var;
}
std::sort(..., compare);

you could do this:

struct compare {
    bool operator()(const arrMember& x, const arrMember& y) const {
        return x.var < y.var;
    }
};
std::sort(..., compare());

or inline the whole thing as a C++11 lambda:

std::sort(..., [](const arrMember& x, const arrMember& y) { return x.var < y.var; });

Either way, the code will run faster; Godbolt shows that either function pointer approach is pretty much the same, while with the functor approach, you reduce runtime relative to the function pointer approach by about a third (saving a little more passing by value in this case, but hardly worth the trouble to think about most of the time; I'd default to passing by const reference, and only consider switching if profiling said it was slow, and the type was simple enough that passing by value might help).

This benefit to templates and functors is why C++'s std::sort reliably beats C's qsort when used appropriately; C lacks templates, so it can't specialize qsort at all, and must always call the comparator through a function pointer. If you use std::sort with a function, it's no real improvement on qsort, but used with a functor/lambda, it generates much faster code at the expense of generating more code (a unique specialization of std::sort for each functor/lambda). You could get the same benefit in C by copy-pasting the code implementing qsort, getting rid of the comparator and inlining the comparison yourself, but that's a lot of maintenance overhead; C++ templates do 99% of the work for you, you just need to remember to use functors/lambdas for your callbacks, not raw functions.

Sign up to request clarification or add additional context in comments.

2 Comments

Very interesting. I'll try using a lambda function later. Thanks for the response. If I could, I would upvote your answer.
@ShadowRanger Nothing prevents the compiler from generating a specialized (partially evaluated) version of your function for a set of arguments known at compile time. No compiler seems to do this (at least for C++, but ghc (Haskell) does), but the same should be achieved by inlining. std::sort can be inlined, then the call to the predicate can be inlined as well. However, std::sort seems to be complex enough to not get inlined.It totally changes if you e.g. use std::find_if. Here, a call with a functions gets inlined but not the functor: godbolt.org/z/xhMxHq
1

In your code you define the comparator function as

bool compare(arrMember(x), arrMember(y)) {
    return (x.var < y.var);
}

The function has two parameters of type arrMember passed by value. This means x and y are copies of the arguments passed to the function. So when you sort your array, each time the comparator is called (which O(n * logn) times) two temporary objects are created, compared and then destroyed. You can modify your function to take the parameters by reference:

bool compare(arrMember const& x, arrMember const& y) {
    return x.var < y.var;
}

This way, the function does not use a copy a reference to the original value. It's a const reference that cannot be modified.

The idea now is that this saves the temporary objects as copies and thus saves runtime. However, the best advice is to measure instead of argue. I've modified your code a little bit to run both versions.

#include <array>
#include <iostream>
#include <algorithm>
#include <fstream>
#include <ctime>
#include <random>

using namespace std;

constexpr size_t N=1000000;

class arrMember {
public:
    int var;
    arrMember(int var) :
        var(var) {}
    arrMember() {};
};

bool compare_by_value(arrMember x, arrMember y) {
    return (x.var < y.var);
}

bool compare_by_reference(arrMember const& x, arrMember const& y) {
    return (x.var < y.var);
}


template<typename C>
void sort(array<arrMember, N>& a, C comp) {
    sort(a.begin(), a.end(), comp);
}

int main(){
    random_device rd{};
    mt19937 engine{ rd() };
    uniform_int_distribution<int> dist{ 0, 9999 };

    array<arrMember, N> a;
    std::generate(a.begin(), a.end(), [&]() {return arrMember{dist(engine)};});

    int tmp=0;
    cout << "Executing sort...\n";
    {
        array<arrMember, N> x = a;
        clock_t t1 = clock();
        sort(x, compare_by_value);
        clock_t t2 = clock();
        double timeDiff = ((double)(t2 - t1)) / CLOCKS_PER_SEC;
        cout << "Sort finished. CPU time was " << timeDiff << " seconds.\n";
        tmp += x.front().var;
    }

    {
        array<arrMember, N> x = a;
        clock_t t1 = clock();
        sort(x, compare_by_reference);
        clock_t t2 = clock();
        double timeDiff = ((double)(t2 - t1)) / CLOCKS_PER_SEC;
        cout << "Sort finished. CPU time was " << timeDiff << " seconds.\n";
        tmp += x.front().var;
    }



    return tmp;
}

Here is an example of the output:

Start

Executing sort... Sort finished. CPU time was 0.105382 seconds. Sort finished. CPU time was 0.108179 seconds.

0

Finish

It seems that there is no difference between both version, so what happened? An optimizing compiler will inline the comparator into the sort algorithm code. To do this, the call in the sort routing is replaced with the body of the function and since the function does not modify the arguments, no copies have to be created to prevent modification of the "passed" values.

11 Comments

Thank you for this answer, it makes a lot of sense. But now I'm back to the drawing board for how to sort faster...
@jared std::sort is O(n logn) which is the optimal complexity for general sorting algorithms. Depending on your input data you could find something faster, e.g. when the data is mostly sorted another sorting algorithm may perform better, but not in the worst-case. If you want to sort integers Bucketsort could be an option.
@jared: Also note: Your original benchmark (but not Jens's) includes the cost of copying from the source array to the temporary array (and it's only done once, on fresh memory, which on Linux-like systems may be being lazily copy-on-write mapped to the zero page, increasing the cost of initializing it). So some of your measured cost isn't the sort, it's the cost of allocating and initializing 4 MB of memory. So make sure to limit the benchmark to the sort itself. After that, yes, a count sort would perform better in this case (at the relatively minor cost of a 10000 element array of counts).
Side-note: Are you sure the compiler inlined the comparison? Using a function instead of a functor (or lambda, which is just syntactic sugar for a functor) is supposed to inhibit that optimization (because sort's template can only specialize to the function prototype, not the specific function). Seems just as likely that passing by reference didn't matter because the class in question was effectively a POD type where it cost just as little to copy as it would to take a reference to it.
On testing, with proper functors for inlining, functors are consistently faster, but functors receiving by reference are actually slower than functors receiving by value (which makes sense, given how cheap copying the object in question is).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.