1

I want to use, for example, this patterns

    rules = {
        '\s': '_',
        '.(?P<word>\w)': '\1',
        'text1': 'text2',
        #etc
    }

using re.sub()

There are some examples like this, but it doesn't work with regex special charecters.

3
  • FYI, I guess you mean (?P<word>\w), but since you are using \1, maybe you just want .(\w). Commented Mar 17, 2016 at 11:09
  • Yes, thanks. It's just example, so in this case it doens't matter .(\w) or (?P<word>\w) Commented Mar 17, 2016 at 11:14
  • You probably need to use the r'<string here>' syntax, or escape your special characters. For example r'\s' or '\\s'... Commented Mar 17, 2016 at 11:20

4 Answers 4

2

I use raw strings when using regex in python. Saves you from having to escape special characters: https://docs.python.org/2/library/re.html
Try:

rules = {
    r"\s": r"_",
    r"text1": r"text2",
    #etc
}
Sign up to request clarification or add additional context in comments.

Comments

2

You should use raw strings like so:

rules = {
    r'\s': r'_',
    r'.(?P<word>\w)': r'\1',
    r'text1': r'text2',
    #etc
}

It means you don't need to escape special characters

Here is why it happens (direct quote from the docs):

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.

And how to solve it (another quote from the docs):

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

Comments

2

Surely, you need to use raw strings when declaring Python regexes, and there are some issues with your examples, but you are interested in how to run the regex replacements.

I suggest using an OrderedDict so that the replacements could be performed in a strict order, as they were defined in the dictionary. Then, the code will look like

import re
from collections import OrderedDict  # adding the import

rules=OrderedDict()                  # defining the regex
rules[r'\s'] = '-'                   #  replacement
rules[r'.(\w)'] = r'\1'              #  pairs
rules['text1'] = 'text2'             #  here

s = "nnoo  mmoorree  tteexxtt11"     # a test string
for key in rules.keys():             # iterating through keys
    s = re.sub(key, rules[key], s)   # perform the S&R
print(s)                             # Demo printing

See the IDEONE demo

Comments

1

Use raw string notation to avoid having to escape your special characters:

rules = {
    '\s': '_',
    '.(?P<word>\w)': '\1',
    'text1': 'text2',
    #etc
}

Directly from the regular expression module (re) documentation:

Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:

>>> re.match(r"\W(.)\1\W", " ff ")
<_sre.SRE_Match object at ...>
>>> re.match("\\W(.)\\1\\W", " ff ")
<_sre.SRE_Match object at ...>

When one wants to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\". Without raw string notation, one must use "\\", making the following lines of code functionally identical:

>>> re.match(r"\\", r"\\")
<_sre.SRE_Match object at ...>
>>> re.match("\\\\", r"\\")
<_sre.SRE_Match object at ...>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.