5

Summary

I would like to parse a string that represents a Python argument list into a form that I can forward to a function call.

Detailed version

I am building an application in which I would like to be able to parse out argument lists from a text string that would then be converted into the *args,**kwargs pattern to forward to an actual method. For example, if my text string is:

"hello",42,helper="Larry, the \"wise\""

the parsed result would be something comparable to:

args=['hello',42]
kwargs={'helper': 'Larry, the "wise"'}

I am aware of Python's ast module, but it only seems to provide a mechanism for parsing entire statements. I can sort of fake this by manufacturing a statement around it, e.g.

ast.parse('f("hello",42,helper="Larry, the \"wise\"")'

and then pull the relevant fields out of the Call node, but this seems like an awful lot of roundabout work.

Is there any way to parse just one known node type from a Python AST, or is there an easier approach for getting this functionality?

If it helps, I only need to be able to support numeric and string arguments, although strings need to support embedded commas and escaped-out quotes and the like.

If there is an existing module for building lexers and parsers in Python I am fine with defining my own AST, as well, but obviously I would prefer to just use functionality that already exists and has been tested correct and so on.

Note: Many of the answers focus on how to store the parsed results, but that's not what I care about; it's the parsing itself that I'm trying to solve, ideally without writing an entire parser engine myself.

Also, my application is already using Jinja which has a parser for Python-ish expressions in its own template parser, although it isn't clear to me how to use it to parse just one subexpression like this. (This is unfortunately not something going into a template, but into a custom Markdown filter, where I'd like the syntax to match its matching Jinja template function as closely as possible.)

5
  • Do you need to use the ast library or would you also consider alternative solutions? Commented Apr 8, 2018 at 21:47
  • Have you considered using ast.literal_eval() from the ast library? Here is the documentation. Commented Apr 8, 2018 at 21:54
  • @FilippoCosta I stated in the question I'm fine with using something else; the AST module was merely an example of one way of doing it. Commented Apr 8, 2018 at 21:56
  • @PedroLobito I'm not looking for a command line argument parser (of which Python already has several), I'm looking for a Python expression arglist parser Commented Apr 8, 2018 at 21:57
  • @A.Wenn literal_eval seems to parse only a single literal, and not a whole set of them. It might be useful as a building block for the full solution but it doesn't quite get me to where I need to be (for example it doesn't know how to consume just one literal as a token) Commented Apr 8, 2018 at 21:57

5 Answers 5

11

I think ast.parse is your best option.

If the parameters were separated by whitespace, we could use shlex.split:

>>> shlex.split(r'"hello" 42 helper="Larry, the \"wise\""')
['hello', '42', 'helper=Larry, the "wise"']

But unfortunately, that doesn't split on commas:

>>> shlex.split(r'"hello",42,helper="Larry, the \"wise\""')
['hello,42,helper=Larry, the "wise"']

I also thought about using ast.literal_eval, but that doesn't support keyword arguments:

>>> ast.literal_eval(r'"hello",42')
('hello', 42)
>>> ast.literal_eval(r'"hello",42,helper="Larry, the \"wise\""')
Traceback (most recent call last):
  File "<unknown>", line 1
    "hello",42,helper="Larry, the \"wise\""
                     ^
SyntaxError: invalid syntax

I couldn't think of any python literal that supports both positional and keyword arguments.


In lack of better ideas, here's a solution using ast.parse:

import ast

def parse_args(args):
    args = 'f({})'.format(args)
    tree = ast.parse(args)
    funccall = tree.body[0].value

    args = [ast.literal_eval(arg) for arg in funccall.args]
    kwargs = {arg.arg: ast.literal_eval(arg.value) for arg in funccall.keywords}
    return args, kwargs

Output:

>>> parse_args(r'"hello",42,helper="Larry, the \"wise\""')
(['hello', 42], {'helper': 'Larry, the "wise"'})
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that ast.parse approach is pretty much what I was expecting to have to do. I'm hoping someone else has a simpler approach but this might be what I end up accepting.
That said it looks like for the bigger picture of what I'm trying to do I might be better off just writing a parser/lexer with PLY. But this definitely gets me on the right path for a shorter-term solution!
1

You can use a function with eval to help you pick apart args and kwargs:

def f(*args, **kwargs):
  return args, kwargs

import numpy as np
eval("f(1, 'a', x=np.int32)")

gives you

((1, 'a'), {'x': <class 'numpy.int32'>})

1 Comment

Clever, and it looks a lot easier to work with than ast. I'd be concerned about possible security issues with eval() though.
0

This is not entirely what you wanted, but it comes close.

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--helper')
>>> kwargs,args = parser.parse_known_args(["hello",'42','--helper="Larry, the \"wise\""'])
>>> vars(kwargs)
{'helper': '"Larry, the "wise""'}
>>> args
['hello', '42']

7 Comments

This doesn't really answer my question at all?
It is a response to the problem statement I would like my result to be something like.... It appears that there isn't a satisfactory answer to the question How do I use ast to do this? so all I was doing was asking you to consider the problem from a different angle.
The problem I was stating wasn't "I would like my result to be something like," it was "I would like to parse this string"
Excuse me, you question contains exactly that phrase. I cut'n'pasted it.
Yes but context matters. The "something like" was an example of how the output of the function could look, but it's the input that's important. Anyway I've edited the question several times to clarify, so please reread it.
|
0

You can use re and a simple class to keep track of the tokens:

import re
class Akwargs:
   grammar = r'"[\w\s_]+"|"[\w\s,_"]+"|\d+|[a-zA-Z0-9_]+|\='
   def __init__(self, tokens):
      self.tokens = tokens
      self.args = []
      self.kwargs = {}
      self.parse()
   def parse(self):
      current = next(self.tokens, None)
      if current:
         check_next = next(self.tokens, None)
         if not check_next:
            self.args.append(re.sub('^"+|"+$', '', current))
         else:
            if check_next == '=':
               last = next(self.tokens, None)
               if not last:
                   raise ValueError("Expecting kwargs key")
               self.kwargs[current] = re.sub('^"|"$', '', last)
            else:
               self.args.extend(list(map(lambda x:re.sub('^"+|"+$', '', x), [current, check_next])))
         self.parse()

s = '"hello",42,helper="Larry, the \"wise\""'
tokens = iter(re.findall(Akwargs.grammar, s))
params = Akwargs(tokens)
print(params.args)
print(params.kwargs)

Output:

['hello', '42']
{'helper': 'Larry, the "wise"'}

Full tests:

strings = ['23,"Bill","James"', 'name="someone",age=23,"testing",300','"hello","42"',  "hello=42", 'foo_bar=5']
new_data = [(lambda x:[getattr(x, i) for i in ['args', 'kwargs']])(Akwargs(iter(re.findall(Akwargs.grammar, d)))) for d in strings]

Output:

[[['23', 'Bill', 'James'], {}], [['testing', '300'], {'age': '23', 'name': 'someone'}], [['hello', '42'], {}], [[], {'hello': '42'}], [[], {'foo_bar': '5'}]]

3 Comments

You'll need to work on that regex. This doesn't parse "hello","42" or "hello=42" or foo_bar=5 correctly.
Well, it parses "hello","42" correctly now, but the other two still don't work as expected. "hello=42" should be a single string instead of a keyword argument, and foo_bar=5 is split into ['foo', 'bar', '=', '5']. There's also "escaped \" quote", which is incorrectly split into two arguments. I don't think regex is the right approach for this task, really. It's more trouble than it's worth.
@Aran-Fey The grammar just needs to be updated as these types of input come to light. I do agree, however, that regex is probably not powerful enough.
0

I've adjusted the solution proposed by Aran-Fey. This works as expected:

import ast

def parse_function_arguments(arg_str):
    # Wrap the argument string in a dummy function call to make it valid Python syntax
    wrapped_arg_str = f"dummy_func({arg_str})"
    # Safe parsing
    tree = ast.parse(wrapped_arg_str, mode="eval")  # Use 'eval' mode for expression parsing
    # Assuming the first part of the tree is a Call node
    funccall = tree.body
    # Process arguments: extract literals directly with safe literal_eval
    args = tuple([ast.literal_eval(arg) for arg in funccall.args])
    # Process keyword arguments: convert values from AST nodes to literals
    kwargs = {kw.arg: ast.literal_eval(kw.value) for kw in funccall.keywords}

    return args, kwargs

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.