EN.601.727 Machine Programming

Johns Hopkins University — Fall 2025

Instructor: Ziyang Li   |   Email: ziyang@cs.jhu.edu

Oral Presentations

Sign-Up

We’ll have student-led presentation sessions starting from Week 9 of the semester. Each session will focus on one research topic with three related papers. You’ll form a group of three students, with each student presenting one paper (≈ 15–20 min talk + 5–10 min Q&A).

Please fill out the Google Form here to rank your topic preferences. I’ll use these responses to form balanced groups and avoid race conditions.

Deadline: Friday, Oct 10 at 11:59 PM

Grading Policy

This oral presentation is graded on a completion basis, contributing 10% of your final course grade:

Important: If you do not submit the form by the deadline, you’ll be automatically assigned to a topic and presentation slot at the instructor’s discretion. Failure to present in your assigned slot without prior approval will result in a zero for the presentation component.

Attendance Notice

Attendance during the student-led sessions will be recorded strictly. If you must miss a session due to external circumstances, please notify the instructor or the TA in advance.

Paper Presentation Topics

Each student group will be assigned one of the following topics for their paper presentation. The exact papers can be coordinated between the group members and the instructor. As a start, the instructor will propose at least five papers per topic. You are encouraged to choose papers from this list, though you may also select papers that you find particularly exciting. To ease your preparation, you may choose papers that already have slides or videos available online.

Topic 1. Language Models for Programming: Pretraining, Fine-Tuning, and Adaptation

This topic explores how large language models acquire and specialize programming knowledge. We’ll discuss pretraining objectives (next-token prediction, span corruption), fine-tuning strategies (instruction tuning, reinforcement learning, human and logical feedback), and parameter-efficient adaptation methods such as LoRA and quantization. We’ll also look at dataset construction for code understanding and synthesis, and evaluation benchmarks for measuring program synthesis capability.

Topic 2. LLM Agents and Multi-Agent Frameworks

This topic focuses on how LLMs become agents that plan, communicate, and use tools. We’ll study communication protocols, tool interfaces, and multi-agent coordination frameworks, as well as real-world implementations such as LangChain, LangGraph, AutoGen, Swarm, Codex, Claude Code, and Cursor. Discussion will include agent design principles, tool selection and orchestration, and applications in collaborative programming, debugging, and long-horizon automation.

Topic 3. Search-Based and Evolutionary Program Synthesis

This topic investigates how search and optimization drive program synthesis. We’ll cover genetic programming, Monte-Carlo search, Bayesian optimization, and self-improving loops for evolving programs or repair candidates. We will touch on works such as AlphaEvolve, and explore connections between search and neural synthesis, such as reinforcement-learning-based code refinement and hybrid LLM-guided search.

Topic 4. Applications in Software Engineering and Security

This topic explores how synthesis techniques can automate or enhance software engineering tasks. We’ll cover testing and property-based testing, fuzzing, program analysis (static, dynamic, symbolic), vulnerability detection, exploit and proof-of-concept generation, specification and invariant synthesis, and program verification. Students may explore the application of LLM coders to both symbolic approaches (e.g., symbolic execution, constraint solving) and modern neural-assisted tools for security and reliability.

Topic 5. Applications in Planning and Cyber-Physical Systems

This topic investigates synthesis for physical and hybrid systems that interact with the real world. We’ll examine planning languages (PDDL, Z3 encodings), reward-function synthesis for robot training, CAD and shape program generation, and robot configuration or simulation environment synthesis. Connections to autonomous system design, motion planning, and neurosymbolic control are also encouraged.

Topic 6. Applications in Logic, Mathematics, and Theorem Proving

This topic covers program synthesis for symbolic reasoning and formal domains. We’ll study logic programming languages (Prolog, Datalog), first-order and temporal logic, and theorem-proving environments such as Lean, Coq, and Rocq. Applications include AI4Math and AI4Science, autoformalization, automated proof synthesis, and LLMs for mathematical reasoning or competition-level problem solving.

Topic 7. Applications in Databases and Data-Wrangling Programs

This topic examines program synthesis for data analysis and manipulation tasks. We’ll discuss data-wrangling program generation (e.g., FlashFill, AutoPandas), SQL and NoSQL query synthesis, database optimization and migration, and automatic code generation for data collection and cleaning. Connections to natural-language interfaces for databases and dataflow-oriented program synthesis are also welcome. We’ll also discuss extensions of database-style query systems to new areas such as program analysis (e.g., CodeQL) and knowledge-graph reasoning.