The current Loop Vectorizer maintains separate code paths for fixed-width and scalable vectors. Several optimizations exist only in the fixed-width path, which increases maintenance cost and leads to inconsistent code generation.
One example is reductions. In LoopVectorize.cpp around line 3330, scalable vectorization is disabled if the loop contains reductions, with the message “Scalable vectorization not supported for the reduction operations found in this loop.” In contrast, the same reductions are supported on the fixed-width path. This leads to both a performance gap and duplication, as new improvements are added only to the fixed-width path.
I would like to explore unifying these under a single code path, staged behind -loop-vectorize2. The intent is to remove cases where scalable vectorization is rejected and ensure optimizations apply consistently to both vector types. The cost model will likely be the first component unified, with hooks for target-specific constraints (e.g. unknown vector widths).
Correctness and performance would be checked against the existing vectorization test suite, with additional tests for scalable-specific behaviors (strip-mining, predication, etc.) and benchmarks across x86, AArch64, and RISC-V.
Are there specific concerns I should address before starting exploratory patches, particularly with respect to ongoing work in VPlan and target backends?
I think for most parts the main code-paths should be quite unified. Do you have other examples? A good start would be to collect some IR test cases showing some of the differences.
For the reduction case, the code paths are the same, with just an extra check if scalable vectorization is allowed. It looks like we are using specialized TTI hook to check if a reduction is supported by the backend. Those are usually hard to keep in sync, perhaps there is a better way to model that.
I’d hope we do not need to add a separate flag/code-path and can just fix the issue as we go along, unless we can’t do that for some practical reason
Reductions are supported for scalable vectorization. There’s a subset of reductions that the target may not natively support and need to be scalarized, but scalarization isn’t possible for scalable vectors. It’s because the code paths are unified between scalable and fixed length vectorization that we need to do the check you’re seeing on line 3330.
I searched for evidence in the IR that there was a divergence between scalable and fixed width. My tests showed that the loopvectorizer is unified. I apologize for misreading the code. Would the community be interested in my test files?