3

I use

re.compile(r"(.+?)\1+").findall('44442(2)2(2)44')

can get

['4','2(2)','4']

, but how can I get

['4444','2(2)2(2)','44']

by using regular expression?

Thanks

3 Answers 3

4

No change to your pattern needed. Just need to use to right function for the job. re.findall will return a list of groups if there are capturing groups in the pattern. To get the entire match, use re.finditer instead, so that you can extract the full match from each actual match object.

pattern = re.compile(r"(.+?)\1+")
[match.group(0) for match in pattern.finditer('44442(2)2(2)44')]
Sign up to request clarification or add additional context in comments.

1 Comment

Ooo, even better. Learn something every day :D
3

With minimal change to OP's regular expression:

[m[0] for m in re.compile(r"((.+?)\2+)").findall('44442(2)2(2)44')]

findall will give you the full match if there are no groups, or groups if there are some. So given that you need groups for your regexp to work, we simply add another group to encompass the full match, and extract it afterwards.

Comments

0

You can do:

[i[0] for i in re.findall(r'((\d)(?:[()]*\2*[()]*)*)', s)]

Here the Regex is:

((\d)(?:[()]*\2*[()]*)*)

which will output a list of tuples containing the two captured groups, and we are only interest din the first one hence i[0].

Example:

In [15]: s
Out[15]: '44442(2)2(2)44'

In [16]: [i[0] for i in re.findall(r'((\d)(?:[()]*\2*[()]*)*)', s)]
Out[16]: ['4444', '2(2)2(2)', '44']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.