Insert an array[4] to an array[8] (C++, SSE)

Question

I have this code to get audio output levels in dB to an array (peak_dB[8]) to be used in real time peakmeter:

#include <emmintrin.h>

float channelsPeak[8] = { 0 };
float peak_dB[8] = { 0 };

__m128 log2_sse(__m128 x) {
  // https://www.kvraudio.com/forum/viewtopic.php?p=7524831#p7524831
  // 12-13ulp
  const __m128 c0 = _mm_set1_ps(1.011593342e+01f);
  const __m128 c1 = _mm_set1_ps(1.929443550e+01f);
  const __m128 d0 = _mm_set1_ps(2.095932245e+00f);
  const __m128 d1 = _mm_set1_ps(1.266638851e+01f);
  const __m128 d2 = _mm_set1_ps(6.316540241e+00f);
  const __m128 one = _mm_set1_ps(1.0f);
  const __m128 multi = _mm_set1_ps(1.41421356237f);
  const __m128i mantissa_mask = _mm_set1_epi32((1 << 23) - 1);
  __m128i x_i = _mm_castps_si128(x);
  __m128i spl_exp = _mm_castps_si128(_mm_mul_ps(x, multi));
  spl_exp = _mm_sub_epi32(spl_exp, _mm_castps_si128(one));
  spl_exp = _mm_andnot_si128(mantissa_mask, spl_exp);
  __m128 spl_mantissa = _mm_castsi128_ps(_mm_sub_epi32(x_i, spl_exp));
  spl_exp = _mm_srai_epi32(spl_exp, 23);
  __m128 log2_exponent = _mm_cvtepi32_ps(spl_exp);
  __m128 num = spl_mantissa;
  num = _mm_add_ps(num, c1);
  num = _mm_mul_ps(num, spl_mantissa);
  num = _mm_add_ps(num, c0);
  num = _mm_mul_ps(num, _mm_sub_ps(spl_mantissa, one));
  __m128 denom = d2;
  denom = _mm_mul_ps(denom, spl_mantissa);
  denom = _mm_add_ps(denom, d1);
  denom = _mm_mul_ps(denom, spl_mantissa);
  denom = _mm_add_ps(denom, d0);
  __m128 res = _mm_div_ps(num, denom);
  res = _mm_add_ps(log2_exponent, res);
  return res;
}
__m128 lin2db(__m128 x) {
  const __m128 convert_10 = _mm_set1_ps(6.02059991328f);
  return _mm_mul_ps(log2_sse(x), convert_10);
}

float getPeaks_dB(int cCount) { // cCount = enabled channels (1...8)
    __m128 s1 = _mm_setzero_ps();
    
    s1 = _mm_set_ps(channelsPeak[3], channelsPeak[2], channelsPeak[1], channelsPeak[0]); //channels 1-4
    s1 = lin2db(s1);
    _mm_store_ps(peak_dB, s1);

    if(cCount > 4){
        float t2[4] = { 0 };
        __m128 s2 = _mm_setzero_ps();
        s2 = _mm_set_ps(channelsPeak[7], channelsPeak[6], channelsPeak[5], channelsPeak[4]); // channels 5-8
        s2 = lin2db(s2);
    
        _mm_store_ps(t2, s2);
        for (int i = 4; i < 8; i++){peak_dB[i] = t2[i-4];}
    }
    return 0;
}

where float channelsPeak[0..7] array is storage for linear levels (0.0f..1.0f) of eight channels read from audio rendering device (one GetChannelsPeakValues() call), peak_dB is array of eight elements to hold this value in dB format (to be used in some later calculations and textual representation) and lin2db is 20log10(x) approximation (faster (because of lower accuracy) than std::log10) implemented using SSE intrinsics.

Q: Are there other (better) ways to insert data from s2 into last four elements of peak_dB (AVX excluded)? I'm using VS2013 and, by Compiler Explorer, compiler seem to improve for-loop as used in code now but, as the 1st part in peak_dB stored with _mm_store_ps(peak_dB, s1) looks there much simpler, just wondering if there's a way doing it without for-loop.

Why are you using SSE for this ? It seems pointless in this particular case - is it just for learning/experimental purposes ? — Paul R
– Paul R, Commented Mar 17, 2021 at 12:47
Welcome to the Code Review Community. When we see ... in the code that means that the code as posted is incomplete and we have a tendency to close the question as off-topic because it is Missing Code Context. The lack of code context makes it much more difficult to review the code in the question. — pacmaninbw
– pacmaninbw ♦, Commented Mar 17, 2021 at 13:08
Thanks for updating the code. The description still references cPeak but that was renamed to channelsPeak, right? Please update the description accordingly — Sᴀᴍ Onᴇᴌᴀ
– Sᴀᴍ Onᴇᴌᴀ ♦, Commented Mar 17, 2021 at 20:42
When you're working with intrinsics specifically, it definitely doesn't hurt to comment what's happening in the function (like in `// return log(x * y) — tofro
– tofro, Commented Mar 18, 2021 at 9:46
Hey I saw the comment you posted, you should be able to find a small tick/check mark icon underneath the vote buttons on the answer if you want to accept it (see meta.stackexchange.com/a/5235/354801), however on CR it's generally good practice to wait at least 24h or so in case you get any more good answers - since they can take a while for reviewers to compose and post — Greedo
– Greedo, Commented Mar 18, 2021 at 21:44

user555045 · Accepted Answer · 2021-03-18 18:39:59Z

Storing

Are there other (better) ways to insert data from s2 into last four elements of peak_dB (AVX excluded)?

Yes, actually you already used that way: it's _mm_store_ps. For example:

    s2 = lin2db(s2);
    _mm_store_ps(peak_dB + 4, s2);

Maybe you prefer &peak_dB[4] instead of peak_dB + 4, that works just fine too.

_mm_store_ps takes a pointer to wherever you want to store the data, that pointer does not have to point to the start of an array.

Loading

An other problem here is the use of _mm_set_ps. Though it accepts variable arguments, it is mostly meant for constant arguments, and good code is far from guaranteed if it is used differently. In the code on Godbolt, you can see the effect. Here I removed the code from lin2db from it that got "interleaved" into it:

    movss   xmm1, DWORD PTR float * channelsPeak+12
    movss   xmm0, DWORD PTR float * channelsPeak+8
    movss   xmm2, DWORD PTR float * channelsPeak+4
    movss   xmm4, DWORD PTR float * channelsPeak
    unpcklps xmm2, xmm1
    unpcklps xmm4, xmm0
    unpcklps xmm4, xmm2

And this is seen for the other similar instance of _mm_set_ps as well:

    movss   xmm1, DWORD PTR float * channelsPeak+28
    movss   xmm0, DWORD PTR float * channelsPeak+24
    movss   xmm2, DWORD PTR float * channelsPeak+20
    movss   xmm3, DWORD PTR float * channelsPeak+16
    unpcklps xmm3, xmm0
    unpcklps xmm2, xmm1
    unpcklps xmm3, xmm2

Avoid this pattern, try to use _mm_load(u)_ps if possible. That's easy in this case:

s1 = _mm_loadu_ps(channelsPeak);
// later
s2 = _mm_loadu_ps(channelsPeak + 4);

Zeroing

Initializing local variables of type __m128 like this is fine:

__m128 s1 = _mm_setzero_ps();

There's no serious problem with that, but it does not do anything useful in this code, it's just redundant. Of course if the value of zero is used as an input to something, then it should be done. But here it can just as well be skipped, declaring the variable only when you have a value to assign to it, for example:

__m128 s1 = _mm_loadu_ps(channelsPeak);

Stack Exchange Network

Insert an array[4] to an array[8] (C++, SSE)

1 Answer 1

Storing

Loading

Zeroing

You must log in to answer this question.

Hot Network Questions

Insert an array[4] to an array[8] (C++, SSE)

1 Answer 1

Storing

Loading

Zeroing

You must log in to answer this question.

Related

Hot Network Questions