1

We have a bunch of shell scripts with multiple calls to a specific tool with its respective command line arguments, e.g:

some_tool -a param -b param -c param,param,param
some_tool -d param -f param
some_other_tool -x another_param -Y -z params,params,params
etc.

How can text files containing these calls be parsed and processed cleanly in Python? Is there a library that is intended specifically to parse Unix-like command line invocations? I'm thinking of shlex but this seems to only address a part of it (things like quoted arguments).

NOTE: I'm not interested in providing a CLI to the tool that will process the files, so argparse and the like are not what I'm looking for.

7
  • Why are you not looking for argparse? You can use argparse to parse arguments without using those arguments. That's unless you want this to be generic and not require you teaching argparse what arguments to expect, of course (just asking for clarification) : ) Commented Oct 5, 2015 at 9:52
  • I thought argparse collects arguments passed to its parent tool (presumably from sys.argv) and is not concerned with parsing arbitrary strings Commented Oct 5, 2015 at 9:53
  • You can actually use argparse with arbitrary arguments using parser.parse_args(["my", "--arguments", "go", here"]). You still need to instantiate the parser and tell it about what options to expect, though. You probably also want to subclass the parser so it doesn't output help and exit the program on parse error. Commented Oct 5, 2015 at 9:54
  • Is there a way to do that without telling the parser what to expect? Perhaps something that neatly splits a string (e.g. -t arg), works out what is a switch (-t) and what its argument is (arg)? Commented Oct 5, 2015 at 9:58
  • 2
    If you know the possible options for each tool, then the argparse route is probably best. If you don't, then it's not a well specified problem. e.g. tool -a -b name has different interpretations depending on whether -a is a switch, or an option that must have an argument (in which case -b was the value ...) Commented Oct 5, 2015 at 10:24

1 Answer 1

0

Following your comment of getting a pattern like ('tool', ('-a', 'param'), ('-b', 'param_2'), ('-c', ('param_3', 'param_4'))), it seems like you want to read the file as a collection of strings, which follow the pattern of a command in each line, and separate them into an organized list or tuple.

In that case, you could use regular expressions to help you segment each line into the sections you expect from such pattern. For example:

# Compiled regular expression for the command/tool name
regex_command = re.compile("^(\w+)", re.IGNORECASE)

# Compiled regex for -option_name params
regex_options = re.compile("[/s]*(?:-[\w]+)[\s]*[(?:\w+)[\,]*[\s]*]*[$]*",
                           re.IGNORECASE)

# This will hold the found commands/tools in each line
parsed_tools = []

# Loop through each line of the file (this may be, ie. f.readline() or other)
for line in text.split("\n"):
    # This will hold the found tool/command in the current line
    parsed_tool = []
    # Append the command/tool name found at the start of the line
    parsed_tool.append(regex_command.match(line).group(0))

    # Find the line's options and their parameters with the second regex
    options = regex_options.findall(line)

    # Loop through the found matches
    for option in options:
        # Separate the line of options and parameters by white spaces
        segments = option.split()
        # The first found group is the name of the option
        option_name = segments[0]
        # The rest may be parameters, if any
        option_params = segments[1:] if len(segments) > 1 else None

        # The parameters may be joined by commas, so attempt to separate them
        # even further; otherwise only append the option name
        parsed_tool.append((option_name,
                            tuple(str(option_params).split(",")))
                           if option_params else option_name)

    # Append each parsed_tool into the overall list
    parsed_tools.append(parsed_tool)

In the code above, I'm using compiled regular expressions from the re module, with an added parameter of not being case-sensitive, to find a match of the tool name at the very start of the line (the group() method gives me the only result I'm expecting), and another one to "find all" matches of "-option_name params", where I loop through all the possible results and divide them by spaces and commas.

You can start learning more about regular expressions here. Adjust the regular expressions to suit the patterns you expect from the file.

Sign up to request clarification or add additional context in comments.

1 Comment

I'm aware that I can use regular expressions which is why I emphasised that I'm looking for a library as regexes are dirty and messy. It looks like the most appropriate way to do it is to create separate instances of argaparse for each tool that I need to parse.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.