Why does my python regex not work?

Question

I wanna replace all the chars which occur more than one time,I used Python's re.sub and my regex looks like this data=re.sub('(.)\1+','##',data), But nothing happened...
Here is my Text:

Text

※※※※※※※※※※※※※※※※※Chapter One※※※※※※※※※※※※※※※※※※

This is the begining...

Ashwini Chaudhary · Accepted Answer · 2014-01-10 11:23:16Z

3

You need to use raw string here, 1 is interpreted as octal and then its ASCII value present at its integer equivalent is used in the string.

>>> '\1'
'\x01'
>>> chr(01)
'\x01'
>>> '\101'
'A'
>>> chr(0101)
'A'

Use raw string to fix this:

>>> '(.)\1+'
'(.)\x01+'
>>> r'(.)\1+'  #Note the `r`
'(.)\\1+'

edited Jan 10, 2014 at 11:23

answered Jan 10, 2014 at 8:35

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Martijn Pieters Over a year ago

\1 is interpreted as octal 1, which is then represented as \x01 (hexadecimal). Try \141 for size.

user2357112 · Accepted Answer · 2014-01-10 08:35:39Z

1

Use a raw string, so the regex engine interprets backslashes instead of the Python parser. Just put an r in front of the string:

data=re.sub(r'(.)\1+', '##', data)
            ^ this r is the important bit

Otherwise, \1 is interpreted as character value 1 instead of a backreference.

answered Jan 10, 2014 at 8:35

user2357112

286k32 gold badges490 silver badges570 bronze badges

Collectives™ on Stack Overflow

Why does my python regex not work?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related