Coming from a mostly Python background, I am now learning both C and x86-64 assembly. I used C indirectly via Cython previously but I am now learning C proper in addition to assembly.
My basic question is in what sort of a mindset should I put myself when it comes to optimising compilers. Should I just let a compiler do its job but, once I am sufficiently proficient in assembly, start to check and confirm the assembly output? Is that what responsible C programmers wanting to write high-performance code do?
The question was triggered because I wanted to check what gcc 7.5.0 would optimise the code below to. In particular, I ran objdump to find out how accessing an array twice at the same index would be optimised on various levels.
- On
-O3there were some instructions I have not learnt yet, e.g.movaps XMMWORD PTR [rsp+0x10],xmm0 - Levels
-O2and-O1were somewhat clearer but still I did not understand it fully - On level
-O0I believe I could see a rather straightforward translation of the code where I thinkmessages[idx]was indeed accessed twice
My question is not when these levels should be used. I just ask the more experienced programmers if this is what you do, run code with high optimisations and check assembly output to make sure everything is as expected? Is that the natural workflow for people who want to truly know what machine code a compiler produces?
I understand that the example below is a trivial kind of an opportunity for optimisations but have you just learnt that certain optimisations occur for sure and you do not think about them anymore? There is not a lot of information about what kind of transformations and optimisations can take place, not to mention the fact that compilers leave no notes or messages for programmers to understand what was optimised and why, so I just cannot imagine any other way than simply learning it all in practice. Thanks.
#include <stddef.h>
#include <stdio.h>
int main(int argc, char ** argv)
{
size_t len_messages = 9;
int messages[] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
for(size_t idx=0; idx < len_messages; idx++) {
printf("Accessing here %d and there %d\n", messages[idx], messages[idx]);
}
return 0;
}
-O3and let the compiler do its thing -- unless there seems to be a problem. And there rarely is, in my domain -- the compiler usually generates pretty good code. In many domains I suspect you'll have to be a lot more proactive. To be honest, I suspect you'll get opinions on this, but no knock-down answers.