On Windows, process creation incurs a significant overhead, especially when antivirus software is installed. I estimate the overhead per process on my work PC to be around 0.027 seconds, and Lit spawned 246200 processes during check-llvm, i.e. around 10 minutes of CPU time spent just creating and destroying processes. This RFC demonstrates that we can reduce the time taken by check-llvm by at least 20% on Windows by merging processes.
Inspired by @rgal’s observation that process creation is a significant overhead for Lit tests, I have been trying some ideas to reduce the number of process invocations during check-llvm, to hopefully reduce testing times and CI workload (examining the CI logs on GitHub, running the regression tests on Windows seems to take about 8 minutes - example from a random pull request I found). The main approach I have explored is introducing new frontends for opt/llc that operate on multiple modules in one process, and mechanically “merging” a group of tests just before testing by replacing them with one test that invokes the new multi-module tool.
I drafted a very rough prototype of this idea here and the results are promising, with a reduction in testing runtime on my Windows PC from 950 to 775 seconds, or 236 to 61 seconds if only tests identified as mergeable are considered. Surprisingly, a noticeable difference on Linux (WSL) was also observed, though not as dramatic as the one observed on Windows.
Prototype implementation
Merged tests consist of a single run directive invoking the multi-module version of the tool (these are optmany and llcmany in my proof-of-concept and reside in the llvm/tools directory), the result of which is piped into FileCheck with the test file as the check file as usual. Following this are a series of INPUT_FILE directives specifying the paths to the original test modules, which are interpreted by the multi-module tool. Finally, the FileCheck directives from the original tests are extracted and appended to the merged tests, interspersed by checks for special boundary labels outputted by the multi-module tool.
An example of a merged test is as follows:
RUN: optmany -o - -hide-filename < %s -O2 -S | FileCheck %s
INPUT_FILE: llvm\test\Transforms\InstCombine\no-unwind-inline-asm.ll
INPUT_FILE: llvm\test\Transforms\InstCombine\unwind-inline-asm.ll
CHECK: TEST_BEGIN
CHECK-LABEL: INPUT_FILE llvm\test\Transforms\InstCombine\no-unwind-inline-asm.ll
CHECK: define dso_local void @test()
CHECK-NEXT: entry:
CHECK-NEXT: tail call void asm sideeffect
CHECK-NEXT: ret void
CHECK: TEST_END
CHECK: TEST_BEGIN
CHECK-LABEL: INPUT_FILE llvm\test\Transforms\InstCombine\unwind-inline-asm.ll
CHECK: define dso_local void @test()
CHECK-NEXT: entry:
CHECK-NEXT: invoke void asm sideeffect unwind
CHECK: %0 = landingpad { ptr, i32 }
CHECK: resume { ptr, i32 } %0
CHECK: TEST_END
These are created and run by a Python program called test_consolidator in llvm/utils. To try it, first update the information in test_consolidator.json in the project root and then invoke test_consolidator/main.py from the LLVM clone directory and, once it’s set up and prompts for a command, run all-merged to run all regression tests with as many merged as possible - note that some TableGen tests fail when ran this way; this is because they rely on relative paths but the prototype implementation copies all tests to a temporary directory.
As for deciding which tests can be merged, the prototype implementation just sees tests with the exact same run directive (tests with multiple run directives are treated as multiple tests with one each) and the same REQUIRES/UNSUPPORTED directives and lit.local.cfg as mergeable. This results in 17832 regression tests (42% of all .ll tests) being identified as mergeable; these are converted into 5272 merged tests, reducing the total number of processes invoked from 248091 to 197581.
I also experimented with parallelising optmany but this did not result in a meaningful speed improvement during actual test runs, as Lit is already running tests in parallel.
Prototype limitations
Swathes of code are copied from llc.cpp and optdriver.cpp into llcmany and optmany. This doesn’t just violate the DRY principle but also has the major flaw that changes made to this code will not be tested by the merged tests. If this idea is to be viable then this code must be extracted into library functions (e.g. optInit and optProcessModule) which are called by opt and optmany (this follows the iniative already started by the extraction of optMain to a static library). This refactoring would be the main cost of implementing this solution.
The prototype test consolidator is quite slow to create the merged tests, taking 20 seconds to identify mergeable tests and 1 minute 50 seconds to create the merged tests on my PC. It is written in Python and relies on lots of regexes for parsing out FileCheck directives reliably(ish) and more - I am quite new to writing regexes so this is definitely a major area for optimisation. It should be possible to make this process very fast, so that it can be run just before testing. Also, it is certain that you can invent a Lit test which confuses it (I only made sure all mergeable tests that are normally invoked by check-llvm still pass after merging).
Another issue is that when a merged test fails, it requires some work to work out which test it was. A potential solution could be automatically re-running the individual tests from the merged tests that were reported as failures - this could be implemented by parsing Lit’s output or by integrating with Lit.
Other approaches
A few other approaches I considered are:
- Mechanically merging as many
opt/llctests as possible by concatenating them into one module before testing.- I decided that this approach, while the least invasive (it would only require the introduction of one “merge tests” script that is ran before testing), would be too fragile as merging modules would invalidate the existing FileCheck directives, especially if identifiers, metadata IDs, etc. have to be uniquified.
- Compiling the tools to be tested as shared libraries and invoking them within the Lit process.
- This is the most powerful approach in that it could theoretically eliminate the process creation overhead entirely, and is also the most resilient to being upset by unusually structured Lit tests. However, it poses the extremely invasive restriction on all the tools for which the idea is applied that the
mainfunction must be idempotent and should never callexitor similar, as this would exit from the Lit worker itself - unless some hackery is done like linking with a fakeexitfunction, but this is too evil.
- This is the most powerful approach in that it could theoretically eliminate the process creation overhead entirely, and is also the most resilient to being upset by unusually structured Lit tests. However, it poses the extremely invasive restriction on all the tools for which the idea is applied that the
Is reducing process invocations to speed up regression testing a useful pursuit? Do you think the trade-off of maintenance and increased testing complexity and fragility is worth it for the potential reduction in testing times? Does anyone have other ideas to achieve the same goal in a better way?
CC @jmorse
