Python Regex - remove all duplicate (character and digit) [duplicate]

Question

I am kinda new to Regex and has been trying to build a code to remove duplicates.

The result should look like this 'abcd1"

My code is follow:

import re

text = 'aaaaabbbcccddd111'

while re.search(r'([a-z])(.*)\1', text):
    text = re.sub(r'([a-z])(.*)\1', r'\1\2', text)

print(text)

However, it will not remove the "1", only the a-z characters.

What should I include to make this work?

Thanks!

Add matching the digits to the character class ([a-z0-9])\1+ and replace with r'\1' regex101.com/r/sYO49h/1 — The fourth bird
– The fourth bird, Commented May 17, 2020 at 16:02

Joshua Varghese · Accepted Answer · 2020-05-19 12:20:43Z

0

Include the number group too to the re.sub() and avoid the second group:

import re

text = 'aaaaabbbcccddd111'
text = re.sub(r'([a-z0-9])(.*)\1', r'\1', text)
#or
text = re.sub(r'([a-z]|[0-9])(.*)\1', r'\1', text)

print(text)

gives:

'abcd1'

For strings with multiple group appearance:

re.sub(r'(.)\1+', r'\1', 'abcddbca123321')

edited May 19, 2020 at 12:20

answered May 17, 2020 at 16:02

Joshua Varghese

5,2121 gold badge18 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tuan Anh Hoang Over a year ago

Thanks so much for your help! the code works but not for this text: 'abcddbca123321'

Collectives™ on Stack Overflow

Python Regex - remove all duplicate (character and digit) [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related