Author: @Prabhuk
Issue: [Support] Subcommand support in `OptTable` · Issue #108307 · llvm/llvm-project · GitHub
Introduction
This write-up proposes changes to the LLVM OptTable to support “subcommands” feature. The goal is to extend the functionality of OptTable to allow for hierarchical command-line parsing, where a main command can have several subcommands, each with its own set of options. A key consideration for this design is to maintain backward compatibility, ensuring that existing users of OptTable are not required to modify their code.
Motivation
LLVM’s OptTable lacks support for subcommands, which prevents its adoption by several tools that require this functionality (e.g. llvm-profdata, llvm-cov). This prevents integration of these tools into the LLVM multicall binary (busybox).
Design
Assumptions
A non-exhaustive set of assumptions made in designing this implementation.
- Subcommands are positional arguments.
- Positional arguments are ones which do not have
-or--prefixes. - e.g.
mytool –foo barwherebaris a subcommand. - e.g.
mytool bar –foowhere once againbaris a subcommand.
- Positional arguments are ones which do not have
- Each top level command can have zero or more associated subcommands (e.g. git clone, git diff).
- A top level command can have only one valid subcommand per invocation.
4. e.g.git clone diffis an invalid invocation as there are two subcommands passed here. - Each subcommand can have their own set of options.
- Each option may belong to one or more subcommands.
- Subcommands do not have subcommands.
Goals
- Existing
OptTabledefinitions and usage patterns continue to function as they do currently. Users who do not wish to utilize the subcommand feature will not encounter any breaking changes or be forced to refactor their code. - Textual output of “help” option of subcommands to match CommandLine library’s( CommandLine 2.0 Library Manual — LLVM 22.0.0git documentation ) subcommand behavior.
Link to implementation prototype PR: [llvm] Add subcommand support for OptTable by Prabhuk · Pull Request #155026 · llvm/llvm-project · GitHub
OptParser.td change [github]
This file has tablegen class definitions for OptTable. This design introduces the following changes:
- Introduce a new
class Commandwhich has anamefield. - Introduce a new
class Subcommandwhich extendsCommandand has additional fieldsHelpTextandUsage. - Introduce a new field
list<Command> commandGrouptoclass Option. - Define
TopLevelCommandwhich is of typeCommand. This represents the top level command (e.g. llvm-objcopy) which uses OptTable. Since we want to represent the top level command in tablegen generated content at compile time, I am setting the name of this Command asTopLevelCommand. Happy to hear other ideas on this one. commandGroupis by default assigned a list with just one value theTopLevelCommand(i.e. [TopLevelCommand]).- Add
list<Command> commandGroup = [TopLevelCommand]to other classes which extendclass Option(e.g. Flag, Joined, etc.).
// Define Command and SubCommand classes
class Command<string name> { string Name = name; }
// Define the subcommand class.
class Subcommand<string name, string helpText, string usage=""> : Command<name> {
string HelpText = helpText;
string Usage = usage;
}
// Explicit specifier to represent top level command in compile time
// for backward compatibility with existing Option class definitions.
def TopLevelCommand : Command<"TopLevelCommand">;
class Option<list<string> prefixes, string name, OptionKind kind,
list<Command> commandGroup = [TopLevelCommand]> {
...
// New field CommandGroup
list<Command> CommandGroup = commandGroup;
}
// Changes to Flag and other classes are not shown for brevity. Please check the link to the github PR to see the prototype implementation.
OptTable Changes
Changes to support tablegen backend and parsing the user input from commandline are described in this section.
OptionParserEmitter.cpp [github]
This file parses and generates definitions of class Option and other related types defined in OptParser.td file. emitOptionParser function is changed in the following ways:
- Emit
OptionCommandIDsTablewhich contains{ number of commands, list<command identifiers> }pairs. - Emit a new argument to
OPTIONmacro calls namedCOMMANDIDS_OFFSET. This is an offset into OptionCommandIDsTable that represents the correct pair of commands for a given option. - Emit
OptionCommandstable which is a list of{COMMAND_NAME, COMMAND_HELPTEXT, COMMAND_USAGE}records.
static constexpr unsigned OptionCommandIDsTable[] = {
0 /* commands */,
1 /* commands */, 0 /* 'TopLevelCommand' */,
1 /* commands */, 2 /* 'sc_foo' */,
2 /* commands */, 2 /* 'sc_foo' */, 1 /* 'sc_bar' */
};
static constexpr llvm::opt::OptTable::Command OptionCommands[] = {
{ "TopLevelCommand", nullptr, nullptr },
{ "bar", "HelpText for Subcommand bar.", "OptSubcommand bar <options>" },
{ "foo", "HelpText for Subcommand foo.", "OptSubcommand foo <options>" },
};
OptTable.h [github]
- Add a new
COMMANDIDS_OFFSETparameter to macro definitions for creating OptTable datatypes. - Define a new type
struct Commandto representclass Commandadded toOptParser.tdfile. - Introduce
ArrayRef<Command> CommandsandArrayRef<unsigned> CommandIDsTableto OptTable. Add new constructors to initialize these fields toOptTableand its derived types. The new constructors added so that the existing constructor can be used to maintain backward compatibility. - Modify
printHelpto acceptStringRef SubCommandas its last parameter and its default value is initialized to empty StringRef for backward compatibility.
OptTable.cpp [github]
internalParseOneArgis changed to identify the current active subcommand if any.internalPrintHelpimplementation is changed to print the options related to the current “Active Command”. Global options are printed if there is no active command meaning by default the"TopLevelCommand"is assumed.- ArgList.h and ArgList.cpp files are also changed to support the operations listed above.
Implementation Plan
Current prototype is available in this draft PR: [llvm] Add subcommand support for OptTable by Prabhuk · Pull Request #155026 · llvm/llvm-project · GitHub
If this design is acceptable, here’s list of non-comprehensive set of todos I need to complete:
- Clean up the prototype code to use better variable names, inline comments, documentation etc.
- Add unit tests.
- User interface behavior to match the cl library. If there are suggestions to move away from this interface and if they are valid, adopt the new interface design.
- Break the PR into smaller patches:
OptParser.tdand related changes.- Some of the places in the codebase which redefine OPTION macros can be made to use existing definitions from OptTable.h where possible and these changes can land as NFCs prior to subcommand support change.
Please let me know your thoughts on this change. Thank you.