Python unescaping string in regex replacements

Question

The output of the code below:

rpl = 'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile('apple')
reg.sub( rpl, my_string )

..is:

'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'

..so when printed:

I hope this This is a nicely escaped newline

is replaced with a nicely escaped string

So python is unescaping the string when it replaces 'apple' in the other string? For now I've just done

reg.sub( rpl.replace('\\','\\\\'), my_string )

Is this safe? Is there a way to stop Python from doing that?

When you say 'the output of the code below ... is', does that mean you're using print to determine it? Or a REPL? — Brian Cain
– Brian Cain, Commented Aug 26, 2012 at 4:04
@BrianCain, Sorry for being vague. That's what the string looks like. — mowwwalker
– mowwwalker, Commented Aug 26, 2012 at 4:11

Donal Fellows · Accepted Answer · 2012-08-26 06:52:30Z

4

From help(re.sub) [emphasis mine]:

sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.

One way to get around this is to pass a lambda:

>>> reg.sub(rpl, my_string )
'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'
>>> reg.sub(lambda x: rpl, my_string )
'I hope this This is a nicely escaped newline \\n is replaced with a nicely escaped string'

edited Aug 26, 2012 at 6:52

Donal Fellows

139k19 gold badges161 silver badges222 bronze badges

answered Aug 26, 2012 at 4:08

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

mowwwalker Over a year ago

Weird, wonder why it does that. Thanks for explaining! I've ended up doing rpl.encode('escape_string'), as it makes the code very understandable

nneonneo Over a year ago

@Walkerneo: Replacement patterns are unescaped, but callables are expected to return the exact string that they want to replace (as it's implied they would do any necessary processing already). Hence, the output from a callable replacement is not unescaped.

mowwwalker Over a year ago

@nneonneo, Thank you, I understand that, but it does make the code look a bit more cryptic. Someone reading it probably won't see the use in using a lambda expression that just returns a string.

DSM Over a year ago

@Walkerneo: if only there were a way to leave a short message in the code for the reader which would explain it.. :^) More seriously, string_escape (not escape_string) seems like a perfectly viable approach.

BrenBarn Over a year ago

The reason backslashes are escaped is that the replacement isn't just a plain string but a regex replacement pattern. It can, for instance, contain backreferences like \1 to include groups from the match. Since at least some escapes have to be processed, it makes sense to process them all.

|

nneonneo · Accepted Answer · 2012-08-26 04:14:58Z

0

All regex patterns used for Python's re module are unescaped, including both search and replacement patterns. This is why the r modifier is generally used with regex patterns in Python, as it reduces the amount of "backwhacking" necessary to write usable patterns.

The r modifier appears before a string constant and basically makes all \ characters (except those before string delimiters) verbatim. So, r'\\' == '\\\\', and r'\n' == '\\n'.

Writing your example as

rpl = r'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile(r'apple')
reg.sub( rpl, my_string )

works as expected.

answered Aug 26, 2012 at 4:14

nneonneo

181k37 gold badges331 silver badges412 bronze badges

1 Comment

mowwwalker Over a year ago

The example in the question was a bit contrived and I wouldn't be working with string literals.

Collectives™ on Stack Overflow

Python unescaping string in regex replacements

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related