Get multiple search using regex in python in singe line

Question

I have line

[0 +5] 23 for bucket [5 +5] 1 for bucket [25 +5] 22 for bucket [50 +5] 1

And using reqex I want result like

[('[0 +5]', '23'), ('[5 +5]', '1'), ('[25 +5] ','22'), ('[50 +5]', 1)]

but getting below result,

[('[50 +5]', '1')]

Used :

stats_iter = re.findall('(?:.*)(?:(\[.*\]) (\d+)).*', stat_log,re.DOTALL)
print(stats_iter)

Since you have .* at the end of the regular expression, the first match will continue matching to the end of the string. — Barmar
– Barmar, Commented Feb 26, 2019 at 17:35
Even with {re.findall('(?:.*)(?:([.*]) (\d+))', stat_log,re.DOTALL)} getting same result. — Mayank
– Mayank, Commented Feb 26, 2019 at 17:39

blhsing · Accepted Answer · 2019-02-26 17:39:42Z

1

The * repeater is greedy, so by having (?:.*) as the first part of your regex it consumes all but the last match. You should use a regex that matches just the portion you need instead:

re.findall('(\[.*?\]) (\d+)', stat_log)

answered Feb 26, 2019 at 17:39

blhsing

109k9 gold badges88 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mayank Over a year ago

Thanks. It worked just curious why we need '.*?' instead of '.*' .

blhsing Over a year ago

Glad to be of help. .*? makes the repeater lazy rather than greedy, so that the closest ] would match rather than the farthest one.

Barmar · Accepted Answer · 2019-02-26 17:45:56Z

1

.* at the beginning of the regexp causes the first match to include the entire beginning of the input string, up to the first[. And.*` at the end of the regexp causes the first match to include the rest of the input string.

So both of these prevent the regexp from matching multiple times. You shouldn't use them when you're using re.findall().

Then you need to use non-greedy quantifiers, so that .* won't match across multiple sets of brackets. Or you could use \[[^]]*\] instead of .*, so it won't match the close bracket.

And there's no need for the non-capturing group around the parts you want to capture.

Just use:

re.findall(r'(\[.*?\]) (\d+)', stat_log, re.DOTALL)

DEMO

edited Feb 26, 2019 at 17:45

answered Feb 26, 2019 at 17:40

Barmar

789k57 gold badges554 silver badges669 bronze badges

1 Comment

Barmar Over a year ago

I've updated the answer to say that both of them prevent it from returning multiple matches.

The fourth bird · Accepted Answer · 2019-02-26 17:46:42Z

In your example string, the first non capturing capturing group (?:.*) will match until the end of the string. Then it will backtrack and capture the last [50 +5] in group 2 and the 1 in group 3. For the .* there are no more characters to match.

Instead of .* which is greedy you could use a negated character class matching not an opening or a closing bracket:

(\[[^][]+\])\s+(\d+)

Explanation

( First capturing group
\[[^][]+\] Negated character class to match [, then not ] or [ and match ]
)
\s+ match 1+ times a whitespace char (or use only a space)
(\d+) Capture in group 2 matching 1+ times a digit

regex demo | Python demo

For example:

import re
stat_log = '[0 +5] 23 for bucket [5 +5] 1 for bucket [25 +5] 22 for bucket [50 +5] 1'
stats_iter = re.findall('(\[[^][]+\])\s+(\d+)', stat_log,re.DOTALL)
print(stats_iter)

Collectives™ on Stack Overflow

Get multiple search using regex in python in singe line

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related