1

The output of the code below:

rpl = 'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile('apple')
reg.sub( rpl, my_string )

..is:

'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'

..so when printed:

I hope this This is a nicely escaped newline

is replaced with a nicely escaped string

So python is unescaping the string when it replaces 'apple' in the other string? For now I've just done

reg.sub( rpl.replace('\\','\\\\'), my_string )

Is this safe? Is there a way to stop Python from doing that?

2
  • When you say 'the output of the code below ... is', does that mean you're using print to determine it? Or a REPL? Commented Aug 26, 2012 at 4:04
  • @BrianCain, Sorry for being vague. That's what the string looks like. Commented Aug 26, 2012 at 4:11

2 Answers 2

4

From help(re.sub) [emphasis mine]:

sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.

One way to get around this is to pass a lambda:

>>> reg.sub(rpl, my_string )
'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'
>>> reg.sub(lambda x: rpl, my_string )
'I hope this This is a nicely escaped newline \\n is replaced with a nicely escaped string'
Sign up to request clarification or add additional context in comments.

6 Comments

Weird, wonder why it does that. Thanks for explaining! I've ended up doing rpl.encode('escape_string'), as it makes the code very understandable
@Walkerneo: Replacement patterns are unescaped, but callables are expected to return the exact string that they want to replace (as it's implied they would do any necessary processing already). Hence, the output from a callable replacement is not unescaped.
@nneonneo, Thank you, I understand that, but it does make the code look a bit more cryptic. Someone reading it probably won't see the use in using a lambda expression that just returns a string.
@Walkerneo: if only there were a way to leave a short message in the code for the reader which would explain it.. :^) More seriously, string_escape (not escape_string) seems like a perfectly viable approach.
The reason backslashes are escaped is that the replacement isn't just a plain string but a regex replacement pattern. It can, for instance, contain backreferences like \1 to include groups from the match. Since at least some escapes have to be processed, it makes sense to process them all.
|
0

All regex patterns used for Python's re module are unescaped, including both search and replacement patterns. This is why the r modifier is generally used with regex patterns in Python, as it reduces the amount of "backwhacking" necessary to write usable patterns.

The r modifier appears before a string constant and basically makes all \ characters (except those before string delimiters) verbatim. So, r'\\' == '\\\\', and r'\n' == '\\n'.

Writing your example as

rpl = r'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile(r'apple')
reg.sub( rpl, my_string )

works as expected.

1 Comment

The example in the question was a bit contrived and I wouldn't be working with string literals.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.