Inverting loop order for better loop unrolling

Thibaultm · October 3, 2025, 9:24pm

I noticed a missed LLVM optimization with -O3.

Here, the for (int j = 0; j < 100; ++j) { loop could be optimized away by just multiplying the sum by 100 at the end.

My idea to address this issue is that inverting loops could help spot further optimizations. For example, consider the following code: Compiler Explorer. We inverted the loops, and LLVM is now able to see the loop unrolling possibility.
To achieve this, if an outer loop’s iterator is not used anywhere inside its body, maybe it would be a good idea to make it the inner loop, since this doesn’t break any dependency. In this example it would allow further loop unrolling during the next pass, but in another case it could have allowed for cache locality improvements.

I am sure something similar is already implemented, but if so, why doesn’t it work here?

I’m a beginner when it comes to LLVM. Any feedback would be greatly appreciated.

efriedma-quic · October 3, 2025, 11:29pm

You mean loop interchange?

There’s an LLVM pass for it, but it’s off by default; you can enable it with -mllvm -enable-loopinterchange. That said, it’s not triggering in your case; not sure why at first glance.

Thibaultm · October 4, 2025, 11:18am

Thank you!

Why is it off by default though ?

Meinersbur · October 6, 2025, 9:56am

github.com/llvm/llvm-project

[LoopInterchange] Enable it by default

main ← sjoerdmeijer:enable-interchange

opened 11:56AM - 29 Jan 25 UTC

sjoerdmeijer

+8 -1

This is a work in progress patch to enable loop-interchange by default and is a …continuation of the RFC: https://discourse.llvm.org/t/enabling-loop-interchange/82589 Basically, we promised to fix any compile-time and correctness issues in the different components involved here (loop-interchange and dependence analaysis.) before discussing enabling interchange by default. We think are close to complete this; I would like to explain where we are and wanted to check if there are any thoughts or concerns. A quick overview of the correctness and compile-time improvements that we have made include: Correctness: - #119345 - #111807 - #124901 @kasuga-fj - #116628 Compile-times: - #118973 - #115128 - #124247 - #118656 And in terms of remaining work, we think we are very close to fixing these depenence analysis issues: - #123436 - #116630 - #116632 The compile-time increase with a geomean increase of 0.19% looks good (after committing #124247), I think: stage1-O3: Benchmark kimwitu++ +0.10% sqlite3 +0.14% consumer-typeset +0.07% Bullet +0.06% tramp3d-v4 +0.21% mafft +0.39% ClamAVi +0.06% lencod +0.61% SPASS +0.17% 7zip +0.08% geomean +0.19% See also: http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au We might want to look into lencod to see if we can improve more, but not sure it is strictly necessary.

Topic		Replies	Views
Is llvm capable of doing loop interchange optimization? LLVM Dev List Archives	2	113	November 20, 2017
[RFC] LoopInterchange Pass for llvm LLVM Dev List Archives	0	119	February 5, 2015
GSoC:Loop Reversal Transformation LLVM Dev List Archives	4	65	March 18, 2015
LLVM and loop optimizations LLVM Dev List Archives	3	90	June 12, 2008
llvm-gcc loop unrolling LLVM Dev List Archives	1	90	April 20, 2005

Inverting loop order for better loop unrolling

Related topics