6

I have a string in variable a as below:

a = 'foo(123456) together with foo(2468)'

I would like to use "re" to extract both foo(123456) and foo(2468) from the string.

I have two questions:

  1. What is the correct regex to be used? foo(.\*) doesn't seem to work, as it treats 123456) together with foo(2468 as .*
  2. How to extract both foo?
1
  • You need a non-greedy version. Commented Apr 2, 2015 at 2:02

5 Answers 5

9
import re
pattern = re.compile(r'foo\(.*?\)')
test_str = 'foo(123456) together with foo(2468)'

for match in re.findall(pattern, test_str):
    print(match)

Two things:

  1. .*? is the lazy quantifier. It behaves the same as the greedy quantifier (.*), except it tries to match the least amount of characters possible going from left-to-right across the string. Note that if you want to match at least one character between the parentheses, you'll want to use .+?.

  2. Use \( and \) instead of ( and ) because parentheses are normally used inside regular expressions to indicate capture groups, so if you want to match parentheses literally, you have to use the escape character before them, which is backslash.

Sign up to request clarification or add additional context in comments.

Comments

5

You can use findall with the following expression: r'(foo\(\d+\))':

import re

a = 'foo(123456) together with foo(2468)'

for v in re.findall(r'(foo\(\d+\))', a):
    print(v)

Result is:

foo(123456)
foo(2468)

Your expressoin foo(.*) does not work due to (). You need to escape them, as I did above.

2 Comments

Thanks, Marcin What if my string is : a = 'foo(abcdef) together with foo(jqk)' Which regex should I use?
Use same one but instead of \d+ do as you did .+
4

You could use a negated character class.

>>> a = 'foo(123456) together with foo(2468) foo(abcdef) together with foo(jqk)'
>>> re.findall(r'\bfoo\([^()]*\)', a)
['foo(123456)', 'foo(2468)', 'foo(abcdef)', 'foo(jqk)']

[^()]* negated character class which matches any character but not of ( or ), zero or more times.

Comments

2

Simply use the non-greedy wildcard expression .*?

import re
a = 'foo(123456) together with foo(2468)'
for v in re.findall(r'foo\(.*?\)', a):
  print(v)

Comments

1

Use re.findall(r'foo\(.*?\)'). The backslashes escape the parentheses (which have a special meaning of denoting a group in regex), and the question mark makes the match be performed in a non-greedy manner.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.