The backslash character in Regex for Python [duplicate]

Question

In the Python documentation for Regex, the author mentions:

regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.

He then goes on to give an example of matching \section in a regex:

to match a literal backslash, one has to write '\\' as the RE string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal. In REs that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand.

He then says that the solution to this "backslash plague" is to begin a string with r to turn it into a raw string.

Later though, he gives this example of using Regex:

p = re.compile('\d+')
p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')

which results in:

['12', '11', '10']

I am confused as to why we did not need to include an r in this case before '\d+'. I thought, based on the previous explanations of backslash, that we'd need to tell Python that the backslash in this string is not the backslash that it knows.

You shouldn't have used '\d', it should be '\\d'. Remember, \ is an escape character in Python strings, and the only reason that worked is that \d isn't a recognized escape, so it treated the \ like an ordinary character, but it's reckless and prone to breaking in the future. — Tom Karzes
– Tom Karzes, Commented Apr 10, 2020 at 16:54
Note that OP did not write that code, but just noticed the inconsistency. — Arne
– Arne, Commented Apr 10, 2020 at 17:02
You can also do p.pattern which gives '\\d+' showing that in this case, the escape gives the intended result, but that's not always true for all escape sequences. Best practice is to use raw strings for all regexes. — ggorlen
– ggorlen, Commented Apr 10, 2020 at 17:32

Moberg · Accepted Answer · 2020-04-10 16:58:39Z

3

Python only recognizes some sequences starting with \ as escape sequences. For example \d is not a known escape sequence so for this particular case there is no need to escape the backslah to keep it there.

(In Python 3.6) "\d" and "\\d" are equivalent:

>>> "\d" == "\\d"
True
>>> r"\d" == "\\d"
True

Here is a list of all the recognized escape sequences: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

answered Apr 10, 2020 at 16:58

Moberg

5,6364 gold badges41 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

The backslash character in Regex for Python [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related