0

I have the following code using multiple capturing groups within a non-capturing group:

>>> regex = r'(?:a ([ac]+)|b ([bd]+))'
>>> re.match(regex, 'a caca').groups()
('caca', None)
>>> re.match(regex, 'b bdbd').groups()
(None, 'bdbd')

How can I change the code so it outputs either ('caca') or ('bdbd')?

4
  • 1
    With PyPi regex, you may get ('caca',) and ('bdbd',) Commented Jun 20, 2020 at 22:15
  • 1
    Under which conditions? Cannot produce using PyPi regex v2020.6.8. Commented Jun 21, 2020 at 0:12
  • 1
    r'(?|a ([ac]+)|b ([bd]+))' Commented Jun 21, 2020 at 12:31
  • 1
    I posted an answer below since you seem to get interested. I really believe Python should be shipped with regex module built-in, it is so much faster, stable and powerful than re when it comes to sophisticated pattern matching or handling large texts that it must be part of the default installation bundle in my opinion. Commented Jun 21, 2020 at 12:37

4 Answers 4

2

You are close.

To get the capture always as group 1 can use a lookahead to do the match and then a separate capturing group to capture:

(?:a (?=[ac]+)|b (?=[bd]+))(.*)

Demo

Or in Python3:

>>> regex=r'(?:a (?=[ac]+)|b (?=[bd]+))(.*)'
>>> (?:a (?=[ac]+)|b (?=[bd]+))(.*)
>>> re.match(regex, 'a caca').groups()
('caca',)
>>> re.match(regex, 'b bdbd').groups()
('bdbd',)
Sign up to request clarification or add additional context in comments.

3 Comments

Warning: your regex is capturing next characters too (due to the .* )
A character class or additional regex can be added to fix that. OP did not state what exactly he is looking for there...
Right, i was trying to find a solution to cover exactly his match, but this wasn't the right path.
2

Another option is to get the matches using a lookbehind without a capturing group:

(?<=a )[ac]+|(?<=b )[bd]+

Regex demo

For example

import re

pattern = r'(?<=a )[ac]+|(?<=b )[bd]+'
print (re.search(pattern, 'a caca').group())
print (re.search(pattern, 'b bdbd').group())

Output

caca
bdbd

2 Comments

Yes, however, lookbehind patterns should match strings of the same length if you use re. Again, with PyPi regex, it will become much more scalable since lookbehind pattern with regex can match strings of any length.
That is true, it is not that flexible.
1

You may use a branch reset group with PyPi regex module:

Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don’t use any alternation or capturing groups inside the branch reset group, then its special function doesn’t come into play. It then acts as a non-capturing group.

The regex will look like

(?|a ([ac]+)|b ([bd]+))

See the regex demo. See the Python 3 demo:

import regex
rx = r'(?|a ([ac]+)|b ([bd]+))'
print (regex.search(rx, 'a caca').groups()) # => ('caca',)
print (regex.search(rx, 'b bdbd').groups()) # => ('bdbd',)

Comments

0

See the problem the other way around:

((?:a [ac]+)|(?:b [bd]+))
^ ^         ^ ^
| |         | other exact match
| |         OR
| not capturing for exact match
capture everything

A easier look: https://regex101.com/r/e3bK2B/1/

3 Comments

When I try it, this captures the whole of a caca; the OP only wants to capture the caca part.
Thanks. Enjoyed the explanation. But @alaniwi is right.
Sorry, i tried to find another solution, but didn't found any.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.