I am posting here the project which I discussed in my GSoC proposal to see what other discussion I can have about any future steps.
Description: This project aims to enhance the Input-Gen tool, a scalable framework for stateful input generation, to extend coverage and compatibility with LLVM-supported languages (C/C++, Rust, Julia, and Swift). Introductory Discourse post here. Input-Gen generates inputs for arbitrary program fragments by instrumenting LLVM Intermediate Representation (IR) code, and afterwards capturing and replaying program states. The tool operates through a multi-stage process involving module preparation, LLVM IR instrumentation, runtime execution, and input storage. By utilizing the LLVM ComPile dataset, a large amount of inputs can be generated and evaluated for determining the accuracy of the Input-Gen. The goal is to improve the tool’s accuracy when executing with arbitrary IR files, enabling its adoption for practical purposes defined by LLVM developers, such as comprehensive testing, performance tuning, and ML training.
Expected Results: Enhanced accuracy of the Input-Gen tool, increased coverage percentage, and successful instrumentation and execution of generated inputs from IR bitcode files or modules. By the end of the GSoC timeline, Input-Gen is expected to achieve a larger number of successfully instrumented and executed functions, as well as a higher number of basic blocks executed for each IR file on average. This is relative to previous results discussed in the Input-Gen paper. This will be accomplished by directly editing input-gen.cpp and its associated files, found here.
Project Size: Medium
Requirement: Basic C & C++ skills, familiarity with LLVM IR features
Confirmed Mentors: Aiden Grossman, Ivan Ivanov, Johannes Doerfert