How to combine constexpr and vectorized code?

Question

I am working on a C++ intrinsic wrapper for x64 and neon. I want my functions to be constexpr. My motivation is similar to Constexpr and SSE intrinsics, but #pragma omp simd and intrinsics may not be supported by the compiler (GCC) in a constexpr function. The following code is just a demonstration (auto-vectorization is good enough for addition).

struct FA{
    float c[4];
};

inline constexpr FA add(FA a, FA b){
    FA result{};
    #pragma omp simd            // clang error: statement not allowed in constexpr function
    for(int i = 0; i < 4; i++){ // GCC error: uninitialized variable 'i' in 'constexpr' function
        result.c[i] = b.c[i] + a.c[i];
    }
    return result;
}
struct FA2{
    __m128 c;
};


inline constexpr FA2 add2(FA2 a, FA2 b){
        FA2 result{};
        result.c = _mm_add_ps(a.c,b.c); // GCC error: call to non-'constexpr' function '__m128 _mm_add_ps(__m128, __m128)'
        return result;                  // fine with clang
}

I have to provide reference C++ code for portability anyway. Is there a code efficient way to let the compiler use the reference code at compile time?

f(){
    if(){
        // constexpr version
    }else{
        // intrinsic version
    }
}

It should work on all compilers that support omp, intrinsics and C++20.

Funnily, add() compiles on MSVC, but add2 gives this error: error C3615: constexpr function 'add2' cannot result in a constant expression — user15401571
– user15401571, Commented May 27, 2021 at 17:42
That's exactly the type of scenario for which std::is_constant_evaluated was introduced — user4442671
– user4442671, Commented May 27, 2021 at 18:28
GCC/clang headers define _mm_add_ps without constexpr so you're basically out of luck, unless you use compiler-specific stuff like a.c + b.c to use GNU C native vector syntax (__m128 in GNU C is a vector of floats, and the + operator works on it. __m128i is a vector of two long long). Oh, my answer on the Q&A you already linked already has an example of using GNU C native vector syntax for == on integer vectors instead of _mm_cmpeq_epi32 :P — Peter Cordes
– Peter Cordes, Commented May 27, 2021 at 18:57

score 6 · Accepted Answer · 2021-06-01 14:43:10Z

6

Using std::is_constant_evaluated, you can get exactly what you want:

#include <type_traits>

struct FA{
    float c[4];
};

// Just for the sake of the example. Makes for nice-looking assembly.
extern FA add_parallel(FA a, FA b);

constexpr FA add(FA a, FA b) {
    if (std::is_constant_evaluated()) {
        // do it in a constexpr-friendly manner
        FA result{};
        for(int i = 0; i < 4; i++) {
            result.c[i] = b.c[i] + a.c[i];
        }
        return result;
    } else {
        // can be anything that's not constexpr-friendly.
        return add_parallel(a, b);
    }
}

constexpr FA at_compile_time = add(FA{1,2,3,4}, FA{5,6,7,8});

FA at_runtime(FA a) {
    return add(a, at_compile_time);
}

See on godbolt: https://gcc.godbolt.org/z/szhWKs3ec

edited Jun 1, 2021 at 14:43

answered May 27, 2021 at 18:34

user4442671

Sign up to request clarification or add additional context in comments.

1 Comment

Peter Cordes Over a year ago

gcc.godbolt.org/z/n95Ta8P4n uses (unsigned)at_compile_time.c[0]; in a global array dimension to show that it's truly a 100% valid constexpr. If you change it to const ... at_compile_time instead of constexpr it won't compile, even if you change add to always use the simple inline version. (So simple constant-propagation isn't enough, or GCC chooses not to.)

Collectives™ on Stack Overflow

How to combine constexpr and vectorized code?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related