Subcommand Feature Support in LLVM OptTable

Author: @Prabhuk

Issue: [Support] Subcommand support in `OptTable` · Issue #108307 · llvm/llvm-project · GitHub

Introduction

This write-up proposes changes to the LLVM OptTable to support “subcommands” feature. The goal is to extend the functionality of OptTable to allow for hierarchical command-line parsing, where a main command can have several subcommands, each with its own set of options. A key consideration for this design is to maintain backward compatibility, ensuring that existing users of OptTable are not required to modify their code.

Motivation

LLVM’s OptTable lacks support for subcommands, which prevents its adoption by several tools that require this functionality (e.g. llvm-profdata, llvm-cov). This prevents integration of these tools into the LLVM multicall binary (busybox).

Design

Assumptions

A non-exhaustive set of assumptions made in designing this implementation.

  1. Subcommands are positional arguments.
    1. Positional arguments are ones which do not have - or -- prefixes.
    2. e.g. mytool –foo bar where bar is a subcommand.
    3. e.g. mytool bar –foo where once again bar is a subcommand.
  2. Each top level command can have zero or more associated subcommands (e.g. git clone, git diff).
  3. A top level command can have only one valid subcommand per invocation.
    4. e.g. git clone diff is an invalid invocation as there are two subcommands passed here.
  4. Each subcommand can have their own set of options.
  5. Each option may belong to one or more subcommands.
  6. Subcommands do not have subcommands.

Goals

  1. Existing OptTable definitions and usage patterns continue to function as they do currently. Users who do not wish to utilize the subcommand feature will not encounter any breaking changes or be forced to refactor their code.
  2. Textual output of “help” option of subcommands to match CommandLine library’s( CommandLine 2.0 Library Manual — LLVM 22.0.0git documentation ) subcommand behavior.

Link to implementation prototype PR: [llvm] Add subcommand support for OptTable by Prabhuk · Pull Request #155026 · llvm/llvm-project · GitHub

OptParser.td change [github]

This file has tablegen class definitions for OptTable. This design introduces the following changes:

  1. Introduce a new class Command which has a name field.
  2. Introduce a new class Subcommand which extends Command and has additional fields HelpText and Usage.
  3. Introduce a new field list<Command> commandGroup to class Option.
  4. Define TopLevelCommand which is of type Command. This represents the top level command (e.g. llvm-objcopy) which uses OptTable. Since we want to represent the top level command in tablegen generated content at compile time, I am setting the name of this Command as TopLevelCommand. Happy to hear other ideas on this one.
  5. commandGroup is by default assigned a list with just one value the TopLevelCommand (i.e. [TopLevelCommand]).
  6. Add list<Command> commandGroup = [TopLevelCommand] to other classes which extend class Option (e.g. Flag, Joined, etc.).
// Define Command and SubCommand classes
class Command<string name> { string Name = name; }

// Define the subcommand class.
class Subcommand<string name, string helpText, string usage=""> : Command<name> {
  string HelpText = helpText;
  string Usage = usage;
}

// Explicit specifier to represent top level command in compile time
// for backward compatibility with existing Option class definitions.
def TopLevelCommand : Command<"TopLevelCommand">;

class Option<list<string> prefixes, string name, OptionKind kind,
             list<Command> commandGroup = [TopLevelCommand]> {
...
// New field CommandGroup
list<Command> CommandGroup = commandGroup;
}

// Changes to Flag and other classes are not shown for brevity. Please check the link to the github PR to see the prototype implementation.

OptTable Changes

Changes to support tablegen backend and parsing the user input from commandline are described in this section.

OptionParserEmitter.cpp [github]

This file parses and generates definitions of class Option and other related types defined in OptParser.td file. emitOptionParser function is changed in the following ways:

  1. Emit OptionCommandIDsTable which contains { number of commands, list<command identifiers> } pairs.
  2. Emit a new argument to OPTION macro calls named COMMANDIDS_OFFSET. This is an offset into OptionCommandIDsTable that represents the correct pair of commands for a given option.
  3. Emit OptionCommands table which is a list of {COMMAND_NAME, COMMAND_HELPTEXT, COMMAND_USAGE} records.
static constexpr unsigned OptionCommandIDsTable[] = {
  0 /* commands */,
  1 /* commands */, 0 /* 'TopLevelCommand' */,
  1 /* commands */, 2 /* 'sc_foo' */,
  2 /* commands */, 2 /* 'sc_foo' */, 1 /* 'sc_bar' */
};

static constexpr llvm::opt::OptTable::Command OptionCommands[] = {
  { "TopLevelCommand", nullptr, nullptr },
  { "bar", "HelpText for Subcommand bar.", "OptSubcommand bar <options>" },
  { "foo", "HelpText for Subcommand foo.", "OptSubcommand foo <options>" },
};

OptTable.h [github]

  1. Add a new COMMANDIDS_OFFSET parameter to macro definitions for creating OptTable datatypes.
  2. Define a new type struct Command to represent class Command added to OptParser.td file.
  3. Introduce ArrayRef<Command> Commands and ArrayRef<unsigned> CommandIDsTable to OptTable. Add new constructors to initialize these fields to OptTable and its derived types. The new constructors added so that the existing constructor can be used to maintain backward compatibility.
  4. Modify printHelp to accept StringRef SubCommand as its last parameter and its default value is initialized to empty StringRef for backward compatibility.

OptTable.cpp [github]

  1. internalParseOneArg is changed to identify the current active subcommand if any.
  2. internalPrintHelp implementation is changed to print the options related to the current “Active Command”. Global options are printed if there is no active command meaning by default the "TopLevelCommand" is assumed.
  3. ArgList.h and ArgList.cpp files are also changed to support the operations listed above.

Implementation Plan

Current prototype is available in this draft PR: [llvm] Add subcommand support for OptTable by Prabhuk · Pull Request #155026 · llvm/llvm-project · GitHub

If this design is acceptable, here’s list of non-comprehensive set of todos I need to complete:

  1. Clean up the prototype code to use better variable names, inline comments, documentation etc.
  2. Add unit tests.
  3. User interface behavior to match the cl library. If there are suggestions to move away from this interface and if they are valid, adopt the new interface design.
  4. Break the PR into smaller patches:
    1. OptParser.td and related changes.
    2. Some of the places in the codebase which redefine OPTION macros can be made to use existing definitions from OptTable.h where possible and these changes can land as NFCs prior to subcommand support change.

Please let me know your thoughts on this change. Thank you.

2 Likes

The proposal generally looks good. However, there are some design requirements needs more justification. The one I concern the most is

  1. A top level command can have only one valid subcommand per invocation.
    4. e.g. git clone diff is an invalid invocation as there are two subcommands passed here.

The counter example here is invalid. You can totally do following:

$ git branch clone # create a new branch named “clone”

$ git checkout clone # checkout branch “clone”

The requirement here eliminates the possibility of some input names, which is very undesirable.