I’m a graduate student currently looking for a research topic related to LLVM’s optimizer.
My advisor’s expertise is in compiler optimizations, and I’m interested in contributing to LLVM in this area.
I would like to ask:
Which parts of LLVM’s middle-end optimization pipeline still have active research or engineering opportunities?
Are there areas (e.g. MemorySSA, GVN, loop optimizations, Attributor) that are considered “done” vs. areas where improvements are still welcome?
Are there known open problems or performance pain points that would be good entry points for someone new to LLVM but comfortable with C++ and IR-level analysis?
I want to make sure I pick a direction that is still relevant and valuable to the community.
Pointers to mailing list threads, RFCs, or open issues are very welcome.
If it is practical for you, I recommend attending the LLVM Dev Meeting, which is coming right up towards the end of October. A significant chunk of the community will be there for you to meet and talk to, plus the technical program is generally really interesting too. As a CS professor working in LLVM, I’ve found this conference to be super invaluable and also I find it a ton of fun.
I’ve noticed many parts of MemorySSA and its users (particularly memcpyopt) that I wish were much better and could handle cases much more consistently and in more situations. I started to do some work on them, but my time has been very limited. I think there is also known issues and pain points around other design details like continuing the switch to opaque pointers (and ptrgep) everywhere or making debug info survive optimizations more that pop up on the mailing list here from time to time. I don’t think there is any list of such future goals or starter projects though.