0

I'm trying to remove duplicate lines with this regex that works great:

(.*+)\n*(\1\n+)* 

But when I try to use it in Python it doesn't work:

response1 = re.sub(r'(.*+)\n*', r'(\1\n+)*', response1)

Error:

Exception has occurred: re.error
multiple repeat at position 3

Am I doing something wrong?

Thank you,

6
  • Possible duplicate of Alternative to possessive quantifier in python Commented Jan 9, 2019 at 3:13
  • You could also use [^\n] instead of . to achieve the same effect Commented Jan 9, 2019 at 3:15
  • I don't have a problem with quantifiers nor the Regex itself, I'm trying to make it work in Python Commented Jan 9, 2019 at 3:16
  • The possessive quantifier is the problem - native Python doesn't support them. Commented Jan 9, 2019 at 3:16
  • 1
    Remove the possessive quantifier and use [^\n] instead of .. Also, the replacement string should just be the replacement string (possibly with \ groups), not a regular expression. Commented Jan 9, 2019 at 3:22

1 Answer 1

1

The "multiple repeat at position 3" problem is with the regex:

.*+

You can use either ".*" or ".+". Something like the following should remove consecutive duplicated lines:

response = """A
A    
A
B
B
A
A
"""
print(re.sub(r'(.*\n)(\1)+', r'\2', response))

Output

A
B
A
Sign up to request clarification or add additional context in comments.

2 Comments

I used your code and got this error: expected string or bytes-like object
Likely 'response' is not a string. What is 'response'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.