In the Python documentation for Regex, the author mentions:
regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.
He then goes on to give an example of matching \section in a regex:
to match a literal backslash, one has to write '\\' as the RE string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal. In REs that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand.
He then says that the solution to this "backslash plague" is to begin a string with r to turn it into a raw string.
Later though, he gives this example of using Regex:
p = re.compile('\d+')
p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
which results in:
['12', '11', '10']
I am confused as to why we did not need to include an r in this case before '\d+'. I thought, based on the previous explanations of backslash, that we'd need to tell Python that the backslash in this string is not the backslash that it knows.
'\d', it should be'\\d'. Remember,\is an escape character in Python strings, and the only reason that worked is that\disn't a recognized escape, so it treated the\like an ordinary character, but it's reckless and prone to breaking in the future.p.patternwhich gives'\\d+'showing that in this case, the escape gives the intended result, but that's not always true for all escape sequences. Best practice is to use raw strings for all regexes.