1

I'm new to python, coming from a basic knowledge of perl. I'm trying to capture a substring with regex.

>>> a='Question 73 of 2943'
>>> import re
>>> re.match("Question.*(\d+)\s+of", a).group(0)
'Question 73 of'
>>> re.match("Question.*(\d+)\s+of", a).group(1)
'3'

What I wanted to do was to catch 73 in the group. I assumed that the parenthesis would do that.

1
  • 2
    Operator * is greedy. Use *? instead. Or, better yet, insert a \s in the regex before the number. Commented Apr 16, 2018 at 5:09

3 Answers 3

1

.* is greedy. What this means is it will continue to match any character (except for line terminators) 0 or more times. That means the (\d+) capture group you have set up will never happen. What you can do is make the .* part lazy by adding a ? so your regex would look like...

re.match(r"Question.*?(\d+)\s+of", a)

The difference between lazy and greedy regex is well explained here

Sign up to request clarification or add additional context in comments.

Comments

0

If you would like to capture 73 only, you can do re.search(r'\d+', a).group() which stops searching for a match after finding the first match.

Comments

0

Your .* part will capture any character included a digit. Better to use except.

Question[^\d]*(\d+)\s+of

that should give you 73

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.