Multiple capturing groups within non-capturing group using Python regexes

Question

I have the following code using multiple capturing groups within a non-capturing group:

>>> regex = r'(?:a ([ac]+)|b ([bd]+))'
>>> re.match(regex, 'a caca').groups()
('caca', None)
>>> re.match(regex, 'b bdbd').groups()
(None, 'bdbd')

How can I change the code so it outputs either ('caca') or ('bdbd')?

Under which conditions? Cannot produce using PyPi regex v2020.6.8. — Felix
– Felix, Commented Jun 21, 2020 at 0:12
I posted an answer below since you seem to get interested. I really believe Python should be shipped with regex module built-in, it is so much faster, stable and powerful than re when it comes to sophisticated pattern matching or handling large texts that it must be part of the default installation bundle in my opinion. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jun 21, 2020 at 12:37

dawg · Accepted Answer · 2020-06-20 22:16:14Z

2

You are close.

To get the capture always as group 1 can use a lookahead to do the match and then a separate capturing group to capture:

(?:a (?=[ac]+)|b (?=[bd]+))(.*)

Demo

Or in Python3:

>>> regex=r'(?:a (?=[ac]+)|b (?=[bd]+))(.*)'
>>> (?:a (?=[ac]+)|b (?=[bd]+))(.*)
>>> re.match(regex, 'a caca').groups()
('caca',)
>>> re.match(regex, 'b bdbd').groups()
('bdbd',)

answered Jun 20, 2020 at 22:16

dawg

105k24 gold badges142 silver badges217 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Enzo Caceres Over a year ago

Warning: your regex is capturing next characters too (due to the .* )

dawg Over a year ago

A character class or additional regex can be added to fix that. OP did not state what exactly he is looking for there...

Enzo Caceres Over a year ago

Right, i was trying to find a solution to cover exactly his match, but this wasn't the right path.

Wiktor Stribiżew · Accepted Answer · 2020-06-21 15:06:17Z

2

Another option is to get the matches using a lookbehind without a capturing group:

(?<=a )[ac]+|(?<=b )[bd]+

Regex demo

For example

import re

pattern = r'(?<=a )[ac]+|(?<=b )[bd]+'
print (re.search(pattern, 'a caca').group())
print (re.search(pattern, 'b bdbd').group())

Output

caca
bdbd

edited Jun 21, 2020 at 15:06

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

answered Jun 21, 2020 at 14:40

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

2 Comments

Wiktor Stribiżew Over a year ago

Yes, however, lookbehind patterns should match strings of the same length if you use re. Again, with PyPi regex, it will become much more scalable since lookbehind pattern with regex can match strings of any length.

The fourth bird Over a year ago

That is true, it is not that flexible.

Wiktor Stribiżew · Accepted Answer · 2020-06-21 12:36:25Z

1

You may use a branch reset group with PyPi regex module:

Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don’t use any alternation or capturing groups inside the branch reset group, then its special function doesn’t come into play. It then acts as a non-capturing group.

The regex will look like

(?|a ([ac]+)|b ([bd]+))

See the regex demo. See the Python 3 demo:

import regex
rx = r'(?|a ([ac]+)|b ([bd]+))'
print (regex.search(rx, 'a caca').groups()) # => ('caca',)
print (regex.search(rx, 'b bdbd').groups()) # => ('bdbd',)

answered Jun 21, 2020 at 12:36

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Comments

Enzo Caceres · Accepted Answer · 2020-06-20 21:58:52Z

0

See the problem the other way around:

((?:a [ac]+)|(?:b [bd]+))
^ ^         ^ ^
| |         | other exact match
| |         OR
| not capturing for exact match
capture everything

A easier look: https://regex101.com/r/e3bK2B/1/

answered Jun 20, 2020 at 21:58

Enzo Caceres

5193 silver badges16 bronze badges

3 Comments

alani Over a year ago

When I try it, this captures the whole of a caca; the OP only wants to capture the caca part.

Felix Over a year ago

Thanks. Enjoyed the explanation. But @alaniwi is right.

Enzo Caceres Over a year ago

Sorry, i tried to find another solution, but didn't found any.

Collectives™ on Stack Overflow

Multiple capturing groups within non-capturing group using Python regexes

4 Answers 4

3 Comments

2 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

2 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related