1

I wanna replace all the chars which occur more than one time,I used Python's re.sub and my regex looks like this data=re.sub('(.)\1+','##',data), But nothing happened...
Here is my Text:

Text

※※※※※※※※※※※※※※※※※Chapter One※※※※※※※※※※※※※※※※※※

This is the begining...

2 Answers 2

3

You need to use raw string here, 1 is interpreted as octal and then its ASCII value present at its integer equivalent is used in the string.

>>> '\1'
'\x01'
>>> chr(01)
'\x01'
>>> '\101'
'A'
>>> chr(0101)
'A'

Use raw string to fix this:

>>> '(.)\1+'
'(.)\x01+'
>>> r'(.)\1+'  #Note the `r`
'(.)\\1+'
Sign up to request clarification or add additional context in comments.

1 Comment

\1 is interpreted as octal 1, which is then represented as \x01 (hexadecimal). Try \141 for size.
1

Use a raw string, so the regex engine interprets backslashes instead of the Python parser. Just put an r in front of the string:

data=re.sub(r'(.)\1+', '##', data)
            ^ this r is the important bit

Otherwise, \1 is interpreted as character value 1 instead of a backreference.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.