Python Regex Capturing Group

Question

string1 = "abcdbcdbcde"

I want to extract string1 into three parts: (first part and third part can be empty string)

first part: a

second part (repeitions of some string): bcdbcdbcd

third part: e

import re

string1 = "abcdbcdbcde"
m = re.match("(.*)(.+){2,}(.*)", string1)
print m.groups()[0], m.groups()[1], m.groups()[2]

Of cuz, the code above doesn't work.

As I know, parentheses operator can be used as RegEx capturing group or reference to the pattern. How to use the parentheses operator in these 2 cases at the same time?

What I want:

m.groups()[0] = "a"
m.groups()[1] = "bcdbcdbcd"
m.groups()[2] = "e"

Should the second part be a repetition of the same string? Like bcd bcd or like ab ab ab ab? — The fourth bird
– The fourth bird, Commented May 31, 2019 at 6:31

The fourth bird · Accepted Answer · 2019-05-31 06:35:56Z

3

If the second part should be a repetition of the same string, you could use an optional first a and third part. For the second part you could use a capturing group and a back reference:

^.?(.+)\1+.?$

Regex demo

Or if you want all capturing groups:

^(.?)((.+)\3+)(.?)$

^ Start of string
(.?) Group 1, optionally match any char
( Group 2
- (.+)\3+ Group 3, match any char followed by a backreference to group 3 repeated 1+ gimes
) Close group 3
(.?) Group 4, optionally match any char
$ End of string

Regex demo

answered May 31, 2019 at 6:35

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrej Kesely · Accepted Answer · 2019-05-31 06:49:00Z

1

My take on the problem:

import re

def match(s, m):
    m = re.match("(.*?)?((?:" + m + "){2,})(.*?)?$", s)
    return (m.groups()[0], m.groups()[1], m.groups()[2]) if m else (None, None, None)

print(match("abcdbcdbcde", "bcd"))
print(match("bcdbcdbcd", "bcd"))
print(match("abcdbcdbcd", "bcd"))
print(match("bcdbcdbcde", "bcd"))
print(match("axxbcdbcdxxe", "bcd"))
print(match("axxbcdxxe", "bcd")) # only one bcd in the middle

Prints:

('a', 'bcdbcdbcd', 'e')
('', 'bcdbcdbcd', '')
('a', 'bcdbcdbcd', '')
('', 'bcdbcdbcd', 'e')
('axx', 'bcdbcd', 'xxe')
(None, None, None)

answered May 31, 2019 at 6:49

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Comments

Michał Turczyn · Accepted Answer · 2019-05-31 06:36:15Z

0

I think it is impossible to match exatcly your requirements, as more captuing groups are needed (at least to repeat matching same string with \1).

But you can try (\w+)((\w+)\3+)(\w+)

It will consists of 4 capturing groups. Generally, first capturing group will contain a and last will contain e, second will contain repeated string, rest are irrelevant.

Explanation:

\w+ - match one or more of word characters

\3+ - match string captured in third capturing group, one ore more times

Demo

answered May 31, 2019 at 6:36

Michał Turczyn

41.2k18 gold badges58 silver badges87 bronze badges

Comments

Tim Pietzcker · Accepted Answer · 2019-05-31 06:40:06Z

0

The following regex should work (caveat below):

^(.*?)((.+?)\3+)(.*)

Explanation:

^      # Start of string
(.*?)  # Match any number of characters, as few as possible, until...
(      # (Start capturing group #2)
 (.+?) # ... a string is matched (and captured in group #3)
 \3+   # that is repeated at least once.
)      # End of group #2
(.*)   # Match the rest of the string

Test it live on regex101.com.

Caveat: If the string is long and doesn't have any obvious repeats, this is going to have very bad performance characteristics (O(n!), I think), since the regex engine has to check each and every permutation of substrings. See catastrophic backtracking.

answered May 31, 2019 at 6:40

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Collectives™ on Stack Overflow

Python Regex Capturing Group

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related