0

I am kinda new to Regex and has been trying to build a code to remove duplicates.

The result should look like this 'abcd1"

My code is follow:

import re

text = 'aaaaabbbcccddd111'

while re.search(r'([a-z])(.*)\1', text):
    text = re.sub(r'([a-z])(.*)\1', r'\1\2', text)

print(text)

However, it will not remove the "1", only the a-z characters.

What should I include to make this work?

Thanks!

2
  • Use re.sub(r'(.)\1*', r'\1', text) Commented May 17, 2020 at 16:00
  • Add matching the digits to the character class ([a-z0-9])\1+ and replace with r'\1' regex101.com/r/sYO49h/1 Commented May 17, 2020 at 16:02

1 Answer 1

0

Include the number group too to the re.sub() and avoid the second group:

import re

text = 'aaaaabbbcccddd111'
text = re.sub(r'([a-z0-9])(.*)\1', r'\1', text)
#or
text = re.sub(r'([a-z]|[0-9])(.*)\1', r'\1', text)

print(text)

gives:

'abcd1'

For strings with multiple group appearance:

re.sub(r'(.)\1+', r'\1', 'abcddbca123321')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much for your help! the code works but not for this text: 'abcddbca123321'

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.