9

Suppose I have a string such as the following:

"func(arg1, arg2, arg3, arg4, ..., argn)"

EDIT: This function is not in some particular language. It just has this format. If it makes it easier, don't think of it as a function call, just a string.

I want to write a regular expression to match the function and each of the arguments. I am writing this in Python. The desired output of this is:

{"function" : "func", "arg" : ["arg1", "arg2", ... , "argn"]}

EDIT: While the arguments could be function calls, I can easily recursively try to match them with the same regular expression once I create one that works. By this I mean I can recurse on the function with each of the arguments. But this is not really relevant. I am not trying to create an interpreter, just something to recognize the arguments.

Here is my attempt at this:

import re
s = "func(arg1, arg2, arg3, arg4, argn)"
m = re.match(r"(?P<function>\w+)\s?\((?P<args>(?P<arg>\w+(,\s?)?)+)\)", s)
print m.groupdict()

And here is the output:

{'function': 'func', 'args': 'arg1, arg2, arg3, arg4, argn', 'arg': 'argn'}

The function matches just fine, and so does the argument set. However, I can't seem to match the individual arguments. Is this a problem with my regex, or a limitation of Python regular expression matching?

EDIT2: I am aware that I can now split the arguments using the following code:

d["arg"] = d["args"].split(", ")

But I was wondering if I could do the whole job with regular expressions. In particular, I am wondering why "arg" is matched to only the last argument.

EDIT3: I guess I am (1) hoping to figure out why Python only matches the last argument every time, and (2) whether I can do Scheme-style pattern-matching in Python. Or if there is something just as intuitive in Python as Scheme-style pattern matching. I looked at the ast module, and its syntax is prohibitively complex.

14
  • 1
    Is this function call in some particular language? You shouldn't use a regular expression to parse a language for which a correct/complete parser already exists... Commented Apr 15, 2012 at 17:04
  • You can't do this with regular expressions (assuming you want to match the individual arguments, which themselves could be function calls). You need an actual parser. If you insist on writing your own, then read this: effbot.org/zone/simple-iterator-parser.htm Commented Apr 15, 2012 at 17:12
  • You might want to check out pyparsing if you are planning on doing something more complex. Commented Apr 15, 2012 at 17:20
  • 1
    "I can easily recursively try to match [the arguments] with the same regex" - No, you can't easily do that. Regular expressions don't work that way. Commented Apr 15, 2012 at 17:22
  • @Eduardo, yes we know it can theoretically be done, but it really shouldn't be done. Commented Apr 15, 2012 at 17:26

3 Answers 3

8

Regular expressions cannot parse complex programming languages.

If you're just trying to parse Python, I suggest taking a look at the ast module, which will parse it for you.

Sign up to request clarification or add additional context in comments.

3 Comments

I'm not trying to parse Python, just capture a very specific syntax. From the little I read, it seems that the ast module is (1) specific to Python, and (2) is pretty complex for what I'm trying to do.
Can you provide an example of the code that would be needed in order to get ast to return the parameters of a function definition in Python?
With the help of this I was able to figure it out using ast. For Python, you can do parsed = ast.parse("def foo(a, b, c):\n\tpass") followed by result = {"function": parsed.body[0].name, "args": [ast_arg.arg for ast_arg in parsed.body[0].args.args]}
5

Looks like you're 90% there, why not just swap the arg and args groupings and do:

import re

fn_match = re.match(r"(?P<function>\w+)\s?\((?P<arg>(?P<args>\w+(,\s?)?)+)\)", s)
fn_dict = fn_match.groupdict()
del fn_dict['args']
fn_dict['arg'] = [arg.strip() for arg in fn_dict['arg'].split(',')]

2 Comments

I'm trying to capture the whole thing as a regex. Is that not possible?
Not to get the resultant list of args that you want. Why use only a swiss army knife when you have a whole toolbox?
1

To answer the last part of your question: No. Python does not have anything similar to Scheme's "match", nor does it have pattern matching like ML/Haskell. The closest thing it has is the ability to destructure things like this

>>> (a, [b, c, (d, e)]) = (1, [9, 4, (45, 8)])
>>> e
8

And to extract the head and tail of a list (in Python 3.x) like this...

>>> head, *tail = [1,2,3,4,5]
>>> tail
[2, 3, 4, 5]

There are some modules floating around that do real pattern matching in python though, but I can't vouch for their quality.

If I had to do it, I would implement it a bit differently -- maybe have the ability to input a type and optional arguments (e.g. length, or exact content) and a function to call if it matches, so like match([list, length=3, check=(3, str), func]) and that would match (list _ _ somestr) and call func with somestr in scope, and you could also add more patterns.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.