0

I have the following fragment of code. It contains 3 sections where I measure memory access runtime. First is plain iteration over the array. The second is almost the same with the exception that the array address received from the function call. The third is the same as the second but manually optimized.

#include <map>
#include <cstdlib>
#include <chrono>
#include <iostream>

std::map<void*, void*> cache;

constexpr int elems = 1000000;
double x[elems] = {};

template <typename T>
T& find_in_cache(T& var) {
    void* key = &var;
    void* value = nullptr;
    if (cache.count(key)) {
        value = cache[key];
    } else {
        value = malloc(sizeof(T));
        cache[key] = value;
    }
    return *(T*)value;
}

int main() {
    std::chrono::duration<double> elapsed_seconds1, elapsed_seconds2, elapsed_seconds3;

    for (int k = 0; k < 100; k++) { // account for cache effects
        // first section
        auto start = std::chrono::steady_clock::now();
        for (int i = 1; i < elems; i++) {
            x[i] = (x[i-1] + 1.0) * 1.001;
        }
        auto end = std::chrono::steady_clock::now();
        elapsed_seconds1 = end-start;

        // second section
        start = std::chrono::steady_clock::now();
        for (int i = 1; i < elems; i++) {
            find_in_cache(x)[i] = (find_in_cache(x)[i-1] + 1.0) * 1.001;
        }
        end = std::chrono::steady_clock::now();
        elapsed_seconds2 = end-start;

        // third section
        start = std::chrono::steady_clock::now();
        double* y = find_in_cache(x);
        for (int i = 1; i < elems; i++) {
            y[i] = (y[i-1] + 1.0) * 1.001;
        }
        end = std::chrono::steady_clock::now();
        elapsed_seconds3 = end-start;
    }
    std::cout << "elapsed time 1: " << elapsed_seconds1.count() << "s\n";
    std::cout << "elapsed time 2: " << elapsed_seconds2.count() << "s\n";
    std::cout << "elapsed time 3: " << elapsed_seconds3.count() << "s\n";

    return x[elems - 1]; // prevent optimizing away
}

The timings of these sections are following:

elapsed time 1: 0.0018678s
elapsed time 2: 0.00423903s
elapsed time 3: 0.00189678s

Is it possible to change the interface of find_in_cache() without changing the body of the second iteration section to make its performance the same as section 3?

12
  • how are you testing this? How are you compiling this? Commented Feb 8, 2021 at 11:47
  • @JHBonarius g++ test.cpp -O3 && ./a.out Commented Feb 8, 2021 at 11:47
  • You could lie with [[gnu::pure]]. Commented Feb 8, 2021 at 11:50
  • your workload is too little for the measured times be meaningful. You should increase number of iterations, measure many times and take the average (or consider worst case, there are different strategies) Commented Feb 8, 2021 at 11:50
  • @largest_prime_is_463035818 why is that? I am pretty confident in these times and they show what I expect: the second section is slower because the compiler can't optimize function call away from loop. Commented Feb 8, 2021 at 11:52

1 Answer 1

1
template <typename T>
[[gnu::const]]
T& find_in_cache(T& var) { ... }

lets clang optimize the code the way you want, but gcc fails to handle the call as a loop invariant, even with gnu::noinline to make sure the attribute is not lost (maybe worth a bug report?).

How safe such code is may depend on the rest of your code. It is a lie since the function can use memory, but it may be ok if that memory is private enough to the function. Preventing inlining of find_in_cache may help reduce the risks.

You can also convince gcc to optimize with

template <typename T>
[[gnu::const,gnu::noinline]]
T& find_in_cache(T& var) noexcept { ... }

which would cause your program to terminate if there isn't enough memory to add an element in the cache.

Sign up to request clarification or add additional context in comments.

1 Comment

quite strange to see that clang supports gcc annotations better than gcc itself, would be nice to find workaround for gcc as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.