6

I am working on a C++ intrinsic wrapper for x64 and neon. I want my functions to be constexpr. My motivation is similar to Constexpr and SSE intrinsics, but #pragma omp simd and intrinsics may not be supported by the compiler (GCC) in a constexpr function. The following code is just a demonstration (auto-vectorization is good enough for addition).

struct FA{
    float c[4];
};

inline constexpr FA add(FA a, FA b){
    FA result{};
    #pragma omp simd            // clang error: statement not allowed in constexpr function
    for(int i = 0; i < 4; i++){ // GCC error: uninitialized variable 'i' in 'constexpr' function
        result.c[i] = b.c[i] + a.c[i];
    }
    return result;
}
struct FA2{
    __m128 c;
};


inline constexpr FA2 add2(FA2 a, FA2 b){
        FA2 result{};
        result.c = _mm_add_ps(a.c,b.c); // GCC error: call to non-'constexpr' function '__m128 _mm_add_ps(__m128, __m128)'
        return result;                  // fine with clang
}


I have to provide reference C++ code for portability anyway. Is there a code efficient way to let the compiler use the reference code at compile time?

f(){
    if(){
        // constexpr version
    }else{
        // intrinsic version
    }
}

It should work on all compilers that support omp, intrinsics and C++20.

4
  • Funnily, add() compiles on MSVC, but add2 gives this error: error C3615: constexpr function 'add2' cannot result in a constant expression Commented May 27, 2021 at 17:42
  • Intel accepts both functions. Commented May 27, 2021 at 17:43
  • 2
    That's exactly the type of scenario for which std::is_constant_evaluated was introduced Commented May 27, 2021 at 18:28
  • GCC/clang headers define _mm_add_ps without constexpr so you're basically out of luck, unless you use compiler-specific stuff like a.c + b.c to use GNU C native vector syntax (__m128 in GNU C is a vector of floats, and the + operator works on it. __m128i is a vector of two long long). Oh, my answer on the Q&A you already linked already has an example of using GNU C native vector syntax for == on integer vectors instead of _mm_cmpeq_epi32 :P Commented May 27, 2021 at 18:57

1 Answer 1

6

Using std::is_constant_evaluated, you can get exactly what you want:

#include <type_traits>

struct FA{
    float c[4];
};

// Just for the sake of the example. Makes for nice-looking assembly.
extern FA add_parallel(FA a, FA b);

constexpr FA add(FA a, FA b) {
    if (std::is_constant_evaluated()) {
        // do it in a constexpr-friendly manner
        FA result{};
        for(int i = 0; i < 4; i++) {
            result.c[i] = b.c[i] + a.c[i];
        }
        return result;
    } else {
        // can be anything that's not constexpr-friendly.
        return add_parallel(a, b);
    }
}

constexpr FA at_compile_time = add(FA{1,2,3,4}, FA{5,6,7,8});

FA at_runtime(FA a) {
    return add(a, at_compile_time);
}

See on godbolt: https://gcc.godbolt.org/z/szhWKs3ec

Sign up to request clarification or add additional context in comments.

1 Comment

gcc.godbolt.org/z/n95Ta8P4n uses (unsigned)at_compile_time.c[0]; in a global array dimension to show that it's truly a 100% valid constexpr. If you change it to const ... at_compile_time instead of constexpr it won't compile, even if you change add to always use the simple inline version. (So simple constant-propagation isn't enough, or GCC chooses not to.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.