How to tell the compiler to optimize array access?

Question

I have the following fragment of code. It contains 3 sections where I measure memory access runtime. First is plain iteration over the array. The second is almost the same with the exception that the array address received from the function call. The third is the same as the second but manually optimized.

#include <map>
#include <cstdlib>
#include <chrono>
#include <iostream>

std::map<void*, void*> cache;

constexpr int elems = 1000000;
double x[elems] = {};

template <typename T>
T& find_in_cache(T& var) {
    void* key = &var;
    void* value = nullptr;
    if (cache.count(key)) {
        value = cache[key];
    } else {
        value = malloc(sizeof(T));
        cache[key] = value;
    }
    return *(T*)value;
}

int main() {
    std::chrono::duration<double> elapsed_seconds1, elapsed_seconds2, elapsed_seconds3;

    for (int k = 0; k < 100; k++) { // account for cache effects
        // first section
        auto start = std::chrono::steady_clock::now();
        for (int i = 1; i < elems; i++) {
            x[i] = (x[i-1] + 1.0) * 1.001;
        }
        auto end = std::chrono::steady_clock::now();
        elapsed_seconds1 = end-start;

        // second section
        start = std::chrono::steady_clock::now();
        for (int i = 1; i < elems; i++) {
            find_in_cache(x)[i] = (find_in_cache(x)[i-1] + 1.0) * 1.001;
        }
        end = std::chrono::steady_clock::now();
        elapsed_seconds2 = end-start;

        // third section
        start = std::chrono::steady_clock::now();
        double* y = find_in_cache(x);
        for (int i = 1; i < elems; i++) {
            y[i] = (y[i-1] + 1.0) * 1.001;
        }
        end = std::chrono::steady_clock::now();
        elapsed_seconds3 = end-start;
    }
    std::cout << "elapsed time 1: " << elapsed_seconds1.count() << "s\n";
    std::cout << "elapsed time 2: " << elapsed_seconds2.count() << "s\n";
    std::cout << "elapsed time 3: " << elapsed_seconds3.count() << "s\n";

    return x[elems - 1]; // prevent optimizing away
}

The timings of these sections are following:

elapsed time 1: 0.0018678s
elapsed time 2: 0.00423903s
elapsed time 3: 0.00189678s

Is it possible to change the interface of find_in_cache() without changing the body of the second iteration section to make its performance the same as section 3?

your workload is too little for the measured times be meaningful. You should increase number of iterations, measure many times and take the average (or consider worst case, there are different strategies) — 463035818_is_not_an_ai
– 463035818_is_not_an_ai, Commented Feb 8, 2021 at 11:50
@largest_prime_is_463035818 why is that? I am pretty confident in these times and they show what I expect: the second section is slower because the compiler can't optimize function call away from loop. — user8044236
– user8044236, Commented Feb 8, 2021 at 11:52

Marc Glisse · Accepted Answer · 2021-02-09 20:45:31Z

1

template <typename T>
[[gnu::const]]
T& find_in_cache(T& var) { ... }

lets clang optimize the code the way you want, but gcc fails to handle the call as a loop invariant, even with gnu::noinline to make sure the attribute is not lost (maybe worth a bug report?).

How safe such code is may depend on the rest of your code. It is a lie since the function can use memory, but it may be ok if that memory is private enough to the function. Preventing inlining of find_in_cache may help reduce the risks.

You can also convince gcc to optimize with

template <typename T>
[[gnu::const,gnu::noinline]]
T& find_in_cache(T& var) noexcept { ... }

which would cause your program to terminate if there isn't enough memory to add an element in the cache.

edited Feb 9, 2021 at 20:45

answered Feb 8, 2021 at 13:07

Marc Glisse

8,1152 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user8044236 Over a year ago

quite strange to see that clang supports gcc annotations better than gcc itself, would be nice to find workaround for gcc as well.

Collectives™ on Stack Overflow

How to tell the compiler to optimize array access?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related