Regex/Python: Substitution in Python when the Regex already do the substitution

Question

I'm trying to remove duplicate lines with this regex that works great:

(.*+)\n*(\1\n+)*

But when I try to use it in Python it doesn't work:

response1 = re.sub(r'(.*+)\n*', r'(\1\n+)*', response1)

Error:

Exception has occurred: re.error
multiple repeat at position 3

Am I doing something wrong?

Thank you,

Possible duplicate of Alternative to possessive quantifier in python — CertainPerformance
– CertainPerformance, Commented Jan 9, 2019 at 3:13
You could also use [^\n] instead of . to achieve the same effect — CertainPerformance
– CertainPerformance, Commented Jan 9, 2019 at 3:15
I don't have a problem with quantifiers nor the Regex itself, I'm trying to make it work in Python — Creek Barbara
– Creek Barbara, Commented Jan 9, 2019 at 3:16
The possessive quantifier is the problem - native Python doesn't support them. — CertainPerformance
– CertainPerformance, Commented Jan 9, 2019 at 3:16
Remove the possessive quantifier and use [^\n] instead of .. Also, the replacement string should just be the replacement string (possibly with \ groups), not a regular expression. — CertainPerformance
– CertainPerformance, Commented Jan 9, 2019 at 3:22

user2468968 · Accepted Answer · 2019-01-09 03:31:03Z

1

The "multiple repeat at position 3" problem is with the regex:

.*+

You can use either ".*" or ".+". Something like the following should remove consecutive duplicated lines:

response = """A
A    
A
B
B
A
A
"""
print(re.sub(r'(.*\n)(\1)+', r'\2', response))

Output

A
B
A

answered Jan 9, 2019 at 3:24

user2468968

2863 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

I used your code and got this error: expected string or bytes-like object

Likely 'response' is not a string. What is 'response'