Inverting loop order for better loop unrolling

I noticed a missed LLVM optimization with -O3.

Here’s the C code: Compiler Explorer.

Here, the for (int j = 0; j < 100; ++j) { loop could be optimized away by just multiplying the sum by 100 at the end.

My idea to address this issue is that inverting loops could help spot further optimizations. For example, consider the following code: Compiler Explorer. We inverted the loops, and LLVM is now able to see the loop unrolling possibility.
To achieve this, if an outer loop’s iterator is not used anywhere inside its body, maybe it would be a good idea to make it the inner loop, since this doesn’t break any dependency. In this example it would allow further loop unrolling during the next pass, but in another case it could have allowed for cache locality improvements.

I am sure something similar is already implemented, but if so, why doesn’t it work here?

I’m a beginner when it comes to LLVM. Any feedback would be greatly appreciated.

1 Like

You mean loop interchange?

There’s an LLVM pass for it, but it’s off by default; you can enable it with -mllvm -enable-loopinterchange. That said, it’s not triggering in your case; not sure why at first glance.

Thank you!

Why is it off by default though ?

1 Like