7

I'm trying to parse the following string:

constructor: function(some, parameters, here) {

With the following regex:

re.search("(\w*):\s*function\((?:(\w*)(?:,\s)*)*\)", line).groups()

And I'm getting:

('constructor', '')

But I was expecting something more like:

('constructor', 'some', 'parameters', 'here')

What am I missing?

4 Answers 4

9

If you change your pattern to:

print re.search(r"(\w*):\s*function\((?:(\w+)(?:,\s)?)*\)", line).groups()

You'll get:

('constructor', 'here')

This is because (from docs):

If a group is contained in a part of the pattern that matched multiple times, the last match is returned.

If you can do this in one step, I don't know how. Your alternative, of course is to do something like:

def parse_line(line):
    cons, args = re.search(r'(\w*):\s*function\((.*)\)', line).groups()
    mats = re.findall(r'(\w+)(?:,\s*)?', args)
    return [cons] + mats

print parse_line(line)  # ['constructor', 'some', 'parameters', 'here']
Sign up to request clarification or add additional context in comments.

2 Comments

Looks like a good solution, although it will raise an exception on trying to parse something like 'abcd'.
@DJStroky sure, but wouldn't the code from your question re.search(...).groups() do the same thing?
6

One option is to use more advanced regex instead of the stock re. Among other nice things, it supports captures, which, unlike groups, save every matching substring:

>>> line = "constructor: function(some, parameters, here) {"
>>> import regex
>>> regex.search("(\w*):\s*function\((?:(\w+)(?:,\s)*)*\)", line).captures(2)
['some', 'parameters', 'here']

Comments

5

The re module doesn't support repeated captures: the group count is fixed. Possible workarounds include:

1) Capture the parameters as a string and then split it:

match = re.search("(\w*):\s*function\(([\w\s,]*)\)", line).groups()
args = [arg.strip() for arg in math[1].split(",")]

2) Capture the parameters as a string and then findall it:

match = re.search("(\w*):\s*function\(([\w\s,]*)\)", line).groups()
args = re.findall("(\w+)(?:,\s)*", match[1])

3) If your input string has already been verified, you can just findall the whole thing:

re.findall("(\w+)[:,)]", string)

Alternatively, you can use the regex module and captures(), as suggested by @georg.

Comments

0

You might need two operations here (search and findall):

[re.search(r'[^:]+', given_string).group()] + re.findall(r'(?<=[ (])\w+?(?=[,)])', given_string)

Output: ['constructor', 'some', 'parameters', 'here']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.