39

I ran across something once upon a time and wondered if it was a Python "bug" or at least a misfeature. I'm curious if anyone knows of any justifications for this behavior. I thought of it just now reading "Code Like a Pythonista," which has been enjoyable so far. I'm only familiar with the 2.x line of Python.

Raw strings are strings that are prefixed with an r. This is great because I can use backslashes in regular expressions and I don't need to double everything everywhere. It's also handy for writing throwaway scripts on Windows, so I can use backslashes there also. (I know I can also use forward slashes, but throwaway scripts often contain content cut&pasted from elsewhere in Windows.)

So great! Unless, of course, you really want your string to end with a backslash. There's no way to do that in a 'raw' string.

In [9]: r'\n'
Out[9]: '\\n'

In [10]: r'abc\n'
Out[10]: 'abc\\n'

In [11]: r'abc\'
------------------------------------------------
   File "<ipython console>", line 1
     r'abc\'
           ^
SyntaxError: EOL while scanning string literal


In [12]: r'abc\\'
Out[12]: 'abc\\\\'

So one backslash before the closing quote is an error, but two backslashes gives you two backslashes! Certainly I'm not the only one that is bothered by this?

Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa. If I wanted both, I'd just triple quote. If I really wanted three quotes in a row in a raw string, well, I guess I'd have to deal, but is this considered "proper behavior"?

This is particularly problematic with folder names in Windows, where the backslash is the path delimeter.

0

4 Answers 4

27

It's a FAQ.

And in response to "you really want your string to end with a backslash. There's no way to do that in a 'raw' string.": the FAQ shows how to workaround it.

>>> r'ab\c' '\\' == 'ab\\c\\'
True
>>>
Sign up to request clarification or add additional context in comments.

5 Comments

Certainly seems like a misfeature.
@DS: Your suggested alternative design for raw strings is ...?
Didn't know it was a FAQ, but probably should have assumed as much. ;) Not speaking for @DS, but my alternative design is "no escape processing." You know, kinda like what it says on the tin?
looks like the location of this faq moved to a new location. I think i could edit your answer if I had enough rep, but I do not.
Seems like Python parses raw strings like regular strings, then "un-does" the escapes? Very poor behavior. That said, I think this could be "fixed" without breaking any existing code.
4

Raw strings are meant mostly for readably writing the patterns for regular expressions, which never need a trailing backslash; it's an accident that they may come in handy for Windows (where you could use forward slashes in most cases anyway -- the Microsoft C library which underlies Python accepts either form!). It's not cosidered acceptable to make it (nearly) impossible to write a regular expression pattern containing both single and double quotes, just to reinforce the accident in question.

("Nearly" because triple-quoting would almost alway help... but it could be a little bit of a pain sometimes).

So, yes, raw strings were designed to behave that way (forbidding odd numbers of trailing backslashes), and it is considered perfectly "proper behavior" for them to respect the design decisions Guido made when he invented them;-).

4 Comments

Yes- I addressed the reason why I'm using back slashes in my OP. Thanks though; my point was exactly that triple quoting would get over any problem with using quote characters in regular expressions. Indeed, I've wanted to have a trailing backslash but never a regexp with several different types of quote characters.
This continues to boggle my mind why this is a thing. The stated reason that "it's the only way to have both single and double quotes in the string" falls flat because you always need to have a backslash before this necessary quote mark and that backslash persists in the compiled string. There is no way that I can see to create a string containing only single and double quotes short of triple-quoting.
I wish I could upvote this more. I think the behavior is oddly inconsistent, but this answer gives some hints as to WHY the behavior is oddly inconsistent.
Wait, raw string processing is built-in while regexps must be imported -- I'm not buying this. Python fails here and a fix would be most welcome.
3

Another way to workaround this is:

 >>> print(r"Raw \with\ trailing backslash\ "[:-1])
 Raw \with\ trailing backslash\

Updated for Python 3 and removed unnecessary slash at the end which implied an escape.

Note that personally I doubt I would use the above. I guess maybe if it was a huge string with more than just a path. For the above I'd prefer non-raw and double up the slashes.

1 Comment

Oh joy, a "raw" string where we're escaping the escape character -- which is why most folks want raw strings in the first place! Python is FUBAR here.
-1

Thoughts on why 'raw' strings are 'raw, except for backslash-quote'? I mean, if I wanted to embed a single quote in there I'd just use double quotes around the string, and vice versa.

But that would then raise the question as to why raw strings are 'raw, except for embedded quotes?'

You have to have some escape mechanism, otherwise you can never use the outer quote characters inside the string at all. And then you need an escape mechanism for the escape mechanism.

9 Comments

The rule "you can't use the surrounding quote character in the string" seems an easy one to follow and very pragmatic. In the exceptionally rare case that you need all four of single quote, double quote, tripled single quote, and tripled double quote, I think it be not too out of line to say that those cannot all appear in one continuous raw string. When I want a raw string, I don't want escapes, so it seems stupid to have one escape at one location in a raw string that then causes an error.
@dash-tom-bang That rule prevents you from using that character at all. Any rule that doesn't have that restriction is better than any rule that does.
If the alternative is that you can't do something else that you might want to do (e.g. "have a trailing backslash") then the answer is not so black and white. "It's a raw string except..." violates the desire to "do the obvious thing"; exceptions to rules should be avoided when possible.
@dash-tom-bang Your point eludes me. You haven't addressed the issue. There has to be a way to represent every character in a quoted string. Without an escape mechanism you can't represent the quotes.
Say you want to represent a newline in a raw string? r'\n' gives you a string with two characters. If you leave the 'n' out you get an error. If you add another backslash you again have two characters, backslash-backslash. My point is that the error is an inconsistency for the sake of "there's no reason to ever do this, I have a complete understanding of any application that will ever be written." (To get a newline you need to create a multiline raw string.)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.