1

I have line

[0 +5] 23 for bucket [5 +5] 1 for bucket [25 +5] 22 for bucket [50 +5] 1

And using reqex I want result like

[('[0 +5]', '23'), ('[5 +5]', '1'), ('[25 +5] ','22'), ('[50 +5]', 1)] 

but getting below result,

[('[50 +5]', '1')]

Used :

stats_iter = re.findall('(?:.*)(?:(\[.*\]) (\d+)).*', stat_log,re.DOTALL)
print(stats_iter)
2
  • Since you have .* at the end of the regular expression, the first match will continue matching to the end of the string. Commented Feb 26, 2019 at 17:35
  • Even with {re.findall('(?:.*)(?:([.*]) (\d+))', stat_log,re.DOTALL)} getting same result. Commented Feb 26, 2019 at 17:39

3 Answers 3

1

The * repeater is greedy, so by having (?:.*) as the first part of your regex it consumes all but the last match. You should use a regex that matches just the portion you need instead:

re.findall('(\[.*?\]) (\d+)', stat_log)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. It worked just curious why we need '.*?' instead of '.*' .
Glad to be of help. .*? makes the repeater lazy rather than greedy, so that the closest ] would match rather than the farthest one.
1

.* at the beginning of the regexp causes the first match to include the entire beginning of the input string, up to the first[. And.*` at the end of the regexp causes the first match to include the rest of the input string.

So both of these prevent the regexp from matching multiple times. You shouldn't use them when you're using re.findall().

Then you need to use non-greedy quantifiers, so that .* won't match across multiple sets of brackets. Or you could use \[[^]]*\] instead of .*, so it won't match the close bracket.

And there's no need for the non-capturing group around the parts you want to capture.

Just use:

re.findall(r'(\[.*?\]) (\d+)', stat_log, re.DOTALL)

DEMO

1 Comment

I've updated the answer to say that both of them prevent it from returning multiple matches.
0

In your example string, the first non capturing capturing group (?:.*) will match until the end of the string. Then it will backtrack and capture the last [50 +5] in group 2 and the 1 in group 3. For the .* there are no more characters to match.

Instead of .* which is greedy you could use a negated character class matching not an opening or a closing bracket:

(\[[^][]+\])\s+(\d+)

Explanation

  • ( First capturing group
  • \[[^][]+\] Negated character class to match [, then not ] or [ and match ]
  • )
  • \s+ match 1+ times a whitespace char (or use only a space)
  • (\d+) Capture in group 2 matching 1+ times a digit

regex demo | Python demo

For example:

import re
stat_log = '[0 +5] 23 for bucket [5 +5] 1 for bucket [25 +5] 22 for bucket [50 +5] 1'
stats_iter = re.findall('(\[[^][]+\])\s+(\d+)', stat_log,re.DOTALL)
print(stats_iter)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.