You appear to be using GCC vector extensions. The following code shows how to do broadcasts, vector + scalar, vector*scalar, loads and stores using vector extensions.
#include
#if defined(__clang__)
typedef float v4sf __attribute__((ext_vector_type(4)));
#else
typedef float v4sf __attribute__ ((vector_size (16)));
#endif
void print_v4sf(v4sf a) { for(int i=0; i<4; i++) printf("%f ", a[i]); puts(""); }
int main(void) {
v4sf a;
//broadcast a scalar
a = ((v4sf){} + 1)*3.14159f;
print_v4sf(a);
// vector + scalar
a += 3.14159f;
print_v4sf(a);
// vector*scalar
a *= 3.14159f;
print_v4sf(a);
//load from array
float data[] = {1, 2, 3, 4};
a = *(v4sf*)data;
//a = __builtin_ia32_loadups(data);
//store to array
float store[4];
*(v4sf*)store = a;
for(int i=0; i<4; i++) printf("%f ", store[i]); puts("");
}
Clang 4.0 and ICC 17 support a subset of the GCC vector extensions. However, neither of them support vector + scalar or vector*scalar operations which GCC supports. A work around for Clang is to use Clang's OpenCL vector extensions. I don't know of a work around for ICC. MSVC does not support any kind of vector extension that I am aware of.
With GCC even though it supports vector + scalar and vector*scalar you cannot do vector = scalar (but you can with Clang's OpenCL vector extensions). Instead you can use this trick.
a = ((v4sf){} + 1)*3.14159f;
I would do as Paul R suggests and use intrinsics which are mostly compatible with the four major C/C++ compilers: GCC, Clang, ICC, and MSVC.
Here is a table of what is supported by each compiler using GCC's vector extensions and Clang's OpenCL vector extensions.
gcc g++ clang icc OpenCL
unary operations
[] yes yes yes yes yes
+, – yes yes yes yes yes
++, -- yes yes no no no
~ yes yes yes yes yes
! no yes no no yes
binary vector op vector
+,–,*,/,% yes yes yes yes yes
&,|,^ yes yes yes yes yes
>>,<< yes yes yes yes yes
==, !=, >, <, >=, <= yes yes yes yes yes
&&, || no yes no no yes
binary vector op scalar
+,–,*,/,% yes yes no no yes
&,|,^ yes yes no no yes
>>,<< yes yes no no yes
==, !=, >, <, >=, <= yes yes no no yes
&&, || no yes no no yes
assignment
vector = vector yes yes yes yes yes
vector = scalar no no no no yes
ternary operator
?: no yes no no ?
We see that Clang and ICC do not support GCC's vector operator scalar operations. GCC in C++ mode supports everything but vector = scalar. Clang's OpenCL vector extensions support everything except maybe the ternary operator. Clang's documentation claims it does but I don't get it to work. GCC in C mode additional does not support binary logical operators or the ternary operator.
[sse]- take a look at some of these to become familiar with some of the basics of using SIMD intrinsics,_mm_loadu_psintrinsic to load 4 doubles, and_mm_mul_psto multiply them. Intel's intrinsics are better documented and have more tutorials than the GNU C vector extensions you're using. The main downside is that they're not portable outside of x86, but you're only targeting x86. See the SSE tag wiki for links to docs and tutorials.