2

This is a simpler view of my Problem, I want to convert a float value into defined type v4si (I want to use SIMD Operation for optimization.) Please help to convert float/double value to a defined type.

#include<stdio.h>

typedef double v4si __attribute__ ((vector_size (16)));

int main()
{
    double stoptime=36000;
    float x =0.5*stoptime;
    float * temp = &x;
    v4si a = ((v4si)x);   // Error: Incompatible data types
    v4si b;
    v4si *c;
    c = ((v4si*)&temp);   // Copies address of temp,           
    b = *(c);                   
    printf("%f\n" , b);      //    but printing (*c) crashes program
}
6
  • 4
    C or C++? Which one? Commented May 2, 2017 at 10:15
  • 1
    Is there some reason why you can't just use intrinsics for this in the normal way ? Also, what CPU/architecture are we talking about here ? x86 ? ARM ? POWER/PowerPC ? Commented May 2, 2017 at 10:25
  • Its C programming (mentioned in title). Its x86 architecture, Actually I am very new in SIMD and trying to optimize c code by removing for Loops with SIMD vector multiplications. Commented May 2, 2017 at 10:35
  • OK - I've updated your tags and am assuming you want to use SSE. If you use the search facility here on StackOverflow you'll find lots of questions tagged [sse] - take a look at some of these to become familiar with some of the basics of using SIMD intrinsics, Commented May 2, 2017 at 10:38
  • As a beginner, you'll have an easier time using Intel's _mm_loadu_ps intrinsic to load 4 doubles, and _mm_mul_ps to multiply them. Intel's intrinsics are better documented and have more tutorials than the GNU C vector extensions you're using. The main downside is that they're not portable outside of x86, but you're only targeting x86. See the SSE tag wiki for links to docs and tutorials. Commented May 9, 2017 at 22:12

2 Answers 2

3

You don't need to define a custom SIMD vector type (v4si) or mess around with casts and type punning - just use the provided intrinsics in the appropriate *intrin.h header, e.g.

#include <xmmintrin.h> // use SSE intrinsics 

int main(void)
{
    __m128 v;          // __m128 is the standard SSE vector type for 4 x float
    float x, y, z, w;

    v = _mm_set_ps(x, y, z, w);
                       // use intrinsic to set vector contents to x, y, z, w

    // ...

    return 0;
}
Sign up to request clarification or add additional context in comments.

3 Comments

Same header file can be used for intrinsics in AMD64 architecture also? I have to run my code on multiple Systems.
Yes, no problem - you need to be aware of which SIMD instruction sets are supported on your target architecture if you are going beyond SSE3, but you should be fine with SSE2/SSE3.
@Sarmad : If you are targeting 64-bit code (AMD64/x86-64) then the processor supports SSE and SSE2 (by default) - so yes you can use xmmintrin.h. When AMD created the 64-bit specification for their chips the processor supported SSE/SSE2. Intel then adopted the AMD specification so all x86-64 processors from Intel support by default SSE/SSE2. If you need > SSE2 then not all 64-bit processors may support those features by default.
3

You appear to be using GCC vector extensions. The following code shows how to do broadcasts, vector + scalar, vector*scalar, loads and stores using vector extensions. #include

#if defined(__clang__)
typedef float v4sf __attribute__((ext_vector_type(4)));
#else
typedef float v4sf __attribute__ ((vector_size (16)));
#endif

void print_v4sf(v4sf a) { for(int i=0; i<4; i++) printf("%f ", a[i]); puts(""); }

int main(void) {
  v4sf a;
  //broadcast a scalar
  a = ((v4sf){} + 1)*3.14159f;  
  print_v4sf(a);

  // vector + scalar
  a += 3.14159f;
  print_v4sf(a);

  // vector*scalar
  a *= 3.14159f;
  print_v4sf(a);

  //load from array
  float data[] = {1, 2, 3, 4};
  a = *(v4sf*)data;
  //a = __builtin_ia32_loadups(data);

  //store to array
  float store[4];
  *(v4sf*)store = a;
  for(int i=0; i<4; i++) printf("%f ", store[i]); puts("");
}

Clang 4.0 and ICC 17 support a subset of the GCC vector extensions. However, neither of them support vector + scalar or vector*scalar operations which GCC supports. A work around for Clang is to use Clang's OpenCL vector extensions. I don't know of a work around for ICC. MSVC does not support any kind of vector extension that I am aware of.

With GCC even though it supports vector + scalar and vector*scalar you cannot do vector = scalar (but you can with Clang's OpenCL vector extensions). Instead you can use this trick.

a = ((v4sf){} + 1)*3.14159f;

I would do as Paul R suggests and use intrinsics which are mostly compatible with the four major C/C++ compilers: GCC, Clang, ICC, and MSVC.

Here is a table of what is supported by each compiler using GCC's vector extensions and Clang's OpenCL vector extensions.

                                gcc  g++  clang  icc   OpenCL
unary operations                
[]                              yes  yes  yes    yes   yes
+, –                            yes  yes  yes    yes   yes
++, --                          yes  yes  no     no    no
~                               yes  yes  yes    yes   yes
!                               no   yes  no     no    yes 

binary vector op vector         
+,–,*,/,%                       yes  yes  yes    yes   yes    
&,|,^                           yes  yes  yes    yes   yes
>>,<<                           yes  yes  yes    yes   yes
==, !=, >, <, >=, <=            yes  yes  yes    yes   yes
&&, ||                          no   yes  no     no    yes

binary vector op scalar         
+,–,*,/,%                       yes  yes  no     no    yes
&,|,^                           yes  yes  no     no    yes
>>,<<                           yes  yes  no     no    yes
==, !=, >, <, >=, <=            yes  yes  no     no    yes                      
&&, ||                          no   yes  no     no    yes

assignment
vector = vector                 yes  yes  yes    yes   yes
vector = scalar                 no   no   no     no    yes                                              

ternary operator
?:                              no   yes  no     no    ?

We see that Clang and ICC do not support GCC's vector operator scalar operations. GCC in C++ mode supports everything but vector = scalar. Clang's OpenCL vector extensions support everything except maybe the ternary operator. Clang's documentation claims it does but I don't get it to work. GCC in C mode additional does not support binary logical operators or the ternary operator.

11 Comments

Note that g++ supports more vector extensions than gcc (I should know, I implemented them ;-). I thought even gcc supported some of those "no" but I guess I as misremembering.
@MarcGlisse, I updated my table. I based it off of the Clang's table which is obviously wrong for GCC. I now tested tested many of the operations in the table. It's great to know that GCC supports everything that Clang's OpenCL vector extensions except for vector = scalar and vector.xyzw notation. Maybe it's time to fix vector = scalar in GCC. Why are the logical operators and ternary operators only supported in g++?
Things only supported in C++: same reason why several vector extensions used to be only supported in C, the volunteer who wrote the code was more interested in one language than the other (the 2 front-ends are largely disjoint in gcc).
@MarcGlisse, would it be hard to implement the OpenCL swizzle notation v.xyzw as part of GCC? Do you think it's interesting? I asked a question about how to implement this with C++ stackoverflow.com/questions/19923882/…
@MarcGlisse, Clang does not suppor ?: like they claim stackoverflow.com/questions/25345585/… and godbolt.org/g/rt67UM.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.