Best way to replace multiple characters in a string?

Question

I need to replace some characters as follows: & ➔ \&, # ➔ \#, ...

I coded as follows, but I guess there should be some better way. Any hints?

strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...

See stackoverflow.com/questions/3367809/… and stackoverflow.com/questions/3411006/… — Tim McNamara
– Tim McNamara, Commented Aug 5, 2010 at 4:35

nitin3685 · Accepted Answer · 2019-10-30 12:29:07Z

761

Replacing two characters

I timed all the methods in the current answers along with one extra.

With an input string of abc&def#ghi and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#').

Timings for each function:

a) 1000000 loops, best of 3: 1.47 μs per loop
b) 1000000 loops, best of 3: 1.51 μs per loop
c) 100000 loops, best of 3: 12.3 μs per loop
d) 100000 loops, best of 3: 12 μs per loop
e) 100000 loops, best of 3: 3.27 μs per loop
f) 1000000 loops, best of 3: 0.817 μs per loop
g) 100000 loops, best of 3: 3.64 μs per loop
h) 1000000 loops, best of 3: 0.927 μs per loop
i) 1000000 loops, best of 3: 0.814 μs per loop

Here are the functions:

def a(text):
    chars = "&#"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['&','#']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([&#])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
    esc(text)


def f(text):
    text = text.replace('&', '\&').replace('#', '\#')


def g(text):
    replacements = {"&": "\&", "#": "\#"}
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('&', r'\&')
    text = text.replace('#', r'\#')


def i(text):
    text = text.replace('&', r'\&').replace('#', r'\#')

Timed like this:

python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"

Replacing 17 characters

Here's similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):

def a(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
    esc(text)


def f(text):
    text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')


def g(text):
    replacements = {
        "\\": "\\\\",
        "`": "\`",
        "*": "\*",
        "_": "\_",
        "{": "\{",
        "}": "\}",
        "[": "\[",
        "]": "\]",
        "(": "\(",
        ")": "\)",
        ">": "\>",
        "#": "\#",
        "+": "\+",
        "-": "\-",
        ".": "\.",
        "!": "\!",
        "$": "\$",
    }
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('\\', r'\\')
    text = text.replace('`', r'\`')
    text = text.replace('*', r'\*')
    text = text.replace('_', r'\_')
    text = text.replace('{', r'\{')
    text = text.replace('}', r'\}')
    text = text.replace('[', r'\[')
    text = text.replace(']', r'\]')
    text = text.replace('(', r'\(')
    text = text.replace(')', r'\)')
    text = text.replace('>', r'\>')
    text = text.replace('#', r'\#')
    text = text.replace('+', r'\+')
    text = text.replace('-', r'\-')
    text = text.replace('.', r'\.')
    text = text.replace('!', r'\!')
    text = text.replace('$', r'\$')


def i(text):
    text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')

Here's the results for the same input string abc&def#ghi:

a) 100000 loops, best of 3: 6.72 μs per loop
b) 100000 loops, best of 3: 2.64 μs per loop
c) 100000 loops, best of 3: 11.9 μs per loop
d) 100000 loops, best of 3: 4.92 μs per loop
e) 100000 loops, best of 3: 2.96 μs per loop
f) 100000 loops, best of 3: 4.29 μs per loop
g) 100000 loops, best of 3: 4.68 μs per loop
h) 100000 loops, best of 3: 4.73 μs per loop
i) 100000 loops, best of 3: 4.24 μs per loop

And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$):

a) 100000 loops, best of 3: 7.59 μs per loop
b) 100000 loops, best of 3: 6.54 μs per loop
c) 100000 loops, best of 3: 16.9 μs per loop
d) 100000 loops, best of 3: 7.29 μs per loop
e) 100000 loops, best of 3: 12.2 μs per loop
f) 100000 loops, best of 3: 5.38 μs per loop
g) 10000 loops, best of 3: 21.7 μs per loop
h) 100000 loops, best of 3: 5.7 μs per loop
i) 100000 loops, best of 3: 5.13 μs per loop

Adding a couple of variants:

def ab(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        text = text.replace(ch,"\\"+ch)


def ba(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        if c in text:
            text = text.replace(c, "\\" + c)

With the shorter input:

ab) 100000 loops, best of 3: 7.05 μs per loop
ba) 100000 loops, best of 3: 2.4 μs per loop

With the longer input:

ab) 100000 loops, best of 3: 7.71 μs per loop
ba) 100000 loops, best of 3: 6.08 μs per loop

So I'm going to use ba for readability and speed.

Addendum

Prompted by haccks in the comments, one difference between ab and ba is the if c in text: check. Let's test them against two more variants:

def ab_with_check(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)

def ba_without_check(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)

Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.

╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input  ║  ab  │ ab_with_check │  ba  │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │    4.22       │ 3.45 │    8.01          │
│ Py3, short ║ 5.54 │    1.34       │ 1.46 │    5.34          │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long  ║ 9.3  │    7.15       │ 6.85 │    8.55          │
│ Py3, long  ║ 7.43 │    4.38       │ 4.41 │    7.02          │
└────────────╨──────┴───────────────┴──────┴──────────────────┘

We can conclude that:

Those with the check are up to 4x faster than those without the check
ab_with_check is slightly in the lead on Python 3, but ba (with check) has a greater lead on Python 2
However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!

edited Oct 30, 2019 at 12:29

nitin3685

8731 gold badge9 silver badges20 bronze badges

answered Nov 23, 2014 at 7:37

Hugo

29.8k9 gold badges87 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

haccks Over a year ago

Is if c in text: necessary in ba?

Hugo Over a year ago

@haccks It's not necessary, but it's 2-3x quicker with it. Short string, with: 1.45 usec per loop, and without: 5.3 usec per loop, Long string, with: 4.38 usec per loop and without: 7.03 usec per loop. (Note these aren't directly comparable with the results above, because it's a different machine etc.)

haccks Over a year ago

@Hugo; I think this difference in time is because of replace is called only when c is found in text in case of ba while it is called in every iteration in ab.

Hugo Over a year ago

@haccks Thanks, I've updated my answer with further timings: adding the check is better for both, but the biggest lesson is Python 3 is up to 3x faster!

The Wanderer Over a year ago

@Hugo: i.pinimg.com/originals/ed/55/82/…

|

tommy.carstensen · Accepted Answer · 2018-02-10 02:59:49Z

96

Here is a python3 method using str.translate and str.maketrans:

s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))

The printed string is abc\&def\#ghi.

answered Feb 10, 2018 at 2:59

tommy.carstensen

9,66215 gold badges70 silver badges112 bronze badges

9 Comments

Changaco Over a year ago

This is a good answer, but in practice doing one .translate() appears to be slower than three chained .replace() (using CPython 3.6.4).

tommy.carstensen Over a year ago

@Changaco Thanks for timing it 👍 In practice I would use replace() myself, but I added this answer for the sake of completeness.

Graipher Over a year ago

For large strings and many replacements this should be faster, though some testing would be nice...

adavid Over a year ago

This method allows to perform "clobbering replacements" that the chained versions do not. E.g., replace "a" with "b" and "b" with "a".

Jolbas Dec 17, 2024 at 7:45

@parity3 Since 3.6 invalid escape sequence generate a DeprecationWarning. Since 3.12 it is a SyntaxWarning. In an unspecified future version it will be a SyntaxError. docs.python.org/3.12/whatsnew/3.12.html#other-language-changes

|

ghostdog74 · Accepted Answer · 2010-08-05 05:54:54Z

83

>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
...   if ch in string:
...      string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi

answered Aug 5, 2010 at 5:54

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

7 Comments

Riet Over a year ago

The double backslash escapes the backslash, otherwise python would interpret "\" as a literal quotation character within a still-open string.

user1502657 Over a year ago

@MattSom replace() doesn't modify the original string, but returns a copy. So you need the assignment for the code to have any effect.

Mick_ Over a year ago

why are you using string as a variable name?

kdubs Over a year ago

string is okay as a variable name. "str" is the type in python. you can use it as a variable name, but it would be confusing to the reader.

lorenzo Over a year ago

Do you really need the if? It looks like a duplication of what the replace will be doing anyway.

|

thefourtheye · Accepted Answer · 2014-05-09 11:46:12Z

51

Simply chain the replace functions like this

strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi

If the replacements are going to be more in number, you can do this in this generic way

strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi

answered May 9, 2014 at 11:46

thefourtheye

241k53 gold badges466 silver badges505 bronze badges

Comments

Sebastialonso · Accepted Answer · 2019-03-29 23:36:39Z

23

Late to the party, but I lost a lot of time with this issue until I found my answer.

Short and sweet, translate is superior to replace. If you're more interested in funcionality over time optimization, do not use replace.

Also use translate if you don't know if the set of characters to be replaced overlaps the set of characters used to replace.

Case in point:

Using replace you would naively expect the snippet "1234".replace("1", "2").replace("2", "3").replace("3", "4") to return "2344", but it will return in fact "4444".

Translation seems to perform what OP originally desired.

answered Mar 29, 2019 at 23:36

Sebastialonso

1,4701 gold badge22 silver badges37 bronze badges

Comments

kennytm · Accepted Answer · 2010-08-05 04:39:07Z

18

Are you always going to prepend a backslash? If so, try

import re
rx = re.compile('([&#])')
#                  ^^ fill in the characters here.
strs = rx.sub('\\\\\\1', strs)

It may not be the most efficient method but I think it is the easiest.

answered Aug 5, 2010 at 4:39

kennytm

526k110 gold badges1.1k silver badges1k bronze badges

Comments

krm · Accepted Answer · 2021-12-02 19:02:59Z

11

For Python 3.8 and above, one can use assignment expressions

[text := text.replace(s, f"\\{s}") for s in "&#" if s in text];

Although, I am quite unsure if this would be considered "appropriate use" of assignment expressions as described in PEP 572, but looks clean and reads quite well (to my eyes). The semicolon at the end suppresses output if you run this in a REPL.

This would be "appropriate" if you wanted all intermediate strings as well. For example, (removing all lowercase vowels):

text = "Lorem ipsum dolor sit amet"
intermediates = [text := text.replace(i, "") for i in "aeiou" if i in text]

['Lorem ipsum dolor sit met',
 'Lorm ipsum dolor sit mt',
 'Lorm psum dolor st mt',
 'Lrm psum dlr st mt',
 'Lrm psm dlr st mt']

On the plus side, it does seem (unexpectedly?) faster than some of the faster methods in the accepted answer, and seems to perform nicely with both increasing strings length and an increasing number of substitutions.

The code for the above comparison is below. I am using random strings to make my life a bit simpler, and the characters to replace are chosen randomly from the string itself. (Note: I am using ipython's %timeit magic here, so run this in ipython/jupyter).

import random, string

def make_txt(length):
    "makes a random string of a given length"
    return "".join(random.choices(string.printable, k=length))

def get_substring(s, num):
    "gets a substring"
    return "".join(random.choices(s, k=num))

def a(text, replace): # one of the better performing approaches from the accepted answer
    for i in replace:
        if i in text:
             text = text.replace(i, "")

def b(text, replace):
    _ = (text := text.replace(i, "") for i in replace if i in text) 


def compare(strlen, replace_length):
    "use ipython / jupyter for the %timeit functionality"

    times_a, times_b = [], []

    for i in range(*strlen):
        el = make_txt(i)
        et = get_substring(el, replace_length)

        res_a = %timeit -n 1000 -o a(el, et) # ipython magic

        el = make_txt(i)
        et = get_substring(el, replace_length)
        
        res_b = %timeit -n 1000 -o b(el, et) # ipython magic

        times_a.append(res_a.average * 1e6)
        times_b.append(res_b.average * 1e6)
        
    return times_a, times_b

#----run
t2 = compare((2*2, 1000, 50), 2)
t10 = compare((2*10, 1000, 50), 10)

edited Dec 2, 2021 at 19:02

answered Oct 23, 2020 at 13:21

krm

9578 silver badges14 bronze badges

4 Comments

Peyman Over a year ago

Actually, my code showed that stacked replace functions are faster. i.sstatic.net/6sel0.png

krm Over a year ago

Interesting. Which Python version is this? And can you share the comparison code?

Peyman Over a year ago

It's Python 3.8. Here it is. Should be run in Jupyter notebook. gist.github.com/kiasar/0c1bfcff7646a78b15268a5345d3faaa

krm Over a year ago

Ah! The difference seems to be the use of a loop for replacing items in a list instead of manually chaining each item to be replaced. I guess if the items to be replaced never change, then chaining multiple replace calls (your way) is the the fastest way to do this. @Peyman

Victor Olex · Accepted Answer · 2011-02-16 05:17:34Z

8

You may consider writing a generic escape function:

def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])

>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1

This way you can make your function configurable with a list of character that should be escaped.

answered Feb 16, 2011 at 5:17

Victor Olex

1,5081 gold badge13 silver badges28 bronze badges

Comments

parity3 · Accepted Answer · 2016-01-29 22:53:46Z

3

FYI, this is of little or no use to the OP but it may be of use to other readers (please do not downvote, I'm aware of this).

As a somewhat ridiculous but interesting exercise, wanted to see if I could use python functional programming to replace multiple chars. I'm pretty sure this does NOT beat just calling replace() twice. And if performance was an issue, you could easily beat this in rust, C, julia, perl, java, javascript and maybe even awk. It uses an external 'helpers' package called pytoolz, accelerated via cython (cytoolz, it's a pypi package).

from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))

I'm not even going to explain this because no one would bother using this to accomplish multiple replace. Nevertheless, I felt somewhat accomplished in doing this and thought it might inspire other readers or win a code obfuscation contest.

answered Jan 29, 2016 at 22:53

parity3

70310 silver badges18 bronze badges

2 Comments

Craig Andrews Over a year ago

"functional programming" doesn't mean "using as many functions as possible", you know.

Craig Andrews Over a year ago

This is a perfectly good, pure functional multi-char replacer: gist.github.com/anonymous/4577424f586173fc6b91a215ea2ce89e No allocations, no mutations, no side effects. Readable, too.

jewishmoses · Accepted Answer · 2020-03-19 05:05:24Z

3

How about this?

def replace_all(dict, str):
    for key in dict:
        str = str.replace(key, dict[key])
    return str

then

print(replace_all({"&":"\&", "#":"\#"}, "&#"))

output

\&\#

Comments

CasualCoder3 · Accepted Answer · 2018-01-29 12:08:42Z

2

Using reduce which is available in python2.7 and python3.* you can easily replace mutiple substrings in a clean and pythonic way.

# Lets define a helper method to make it easy to use
def replacer(text, replacements):
    return reduce(
        lambda text, ptuple: text.replace(ptuple[0], ptuple[1]), 
        replacements, text
    )

if __name__ == '__main__':
    uncleaned_str = "abc&def#ghi"
    cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
    print(cleaned_str) # "abc\&def\#ghi"

In python2.7 you don't have to import reduce but in python3.* you have to import it from the functools module.

answered Jan 29, 2018 at 12:08

CasualCoder3

6697 silver badges16 bronze badges

1 Comment

Jean Monet Over a year ago

To add the 'if' condition (variant ba mentioned by Hugo): lambda text, ptuple: text.replace(ptuple[0], ptuple[1]) if ptuple[0] in text else text

Ahmed4end · Accepted Answer · 2020-07-06 16:04:12Z

2

advanced way using regex

import re
text = "hello ,world!"
replaces = {"hello": "hi", "world":" 2020", "!":"."}
regex = re.sub("|".join(replaces.keys()), lambda match: replaces[match.string[match.start():match.end()]], text)
print(regex)

answered Jul 6, 2020 at 16:04

Ahmed4end

3241 gold badge7 silver badges20 bronze badges

Comments

jonesy · Accepted Answer · 2011-02-16 03:22:02Z

1

>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>>

You want to use a 'raw' string (denoted by the 'r' prefixing the replacement string), since raw strings to not treat the backslash specially.

answered Feb 16, 2011 at 3:22

jonesy

3,56221 silver badges26 bronze badges

Comments

Tiago Wutzke de Oliveira · Accepted Answer · 2020-02-25 14:25:03Z

1

Maybe a simple loop for chars to replace:

a = '&#'

to_replace = ['&', '#']

for char in to_replace:
    a = a.replace(char, "\\"+char)

print(a)

>>> \&\#

answered Feb 25, 2020 at 14:25

Tiago Wutzke de Oliveira

711 silver badge2 bronze badges

Comments

Crawsome · Accepted Answer · 2020-12-30 20:47:10Z

0

This will help someone looking for a simple solution.

def replacemany(our_str, to_be_replaced:tuple, replace_with:str):
    for nextchar in to_be_replaced:
        our_str = our_str.replace(nextchar, replace_with)
    return our_str

os = 'the rain in spain falls mainly on the plain ttttttttt sssssssssss nnnnnnnnnn'
tbr = ('a','t','s','n')
rw = ''

print(replacemany(os,tbr,rw))

Output:

he ri i pi fll mily o he pli

answered Dec 30, 2020 at 20:47

Crawsome

921 gold badge1 silver badge4 bronze badges

Comments

Arpan Saini · Accepted Answer · 2022-11-01 06:05:48Z

0

Example is given below for the or condition, it will delete all ' and , from the given string. pass as many characters as you want separated by |

import re
test = re.sub("('|,)","",str(jsonAtrList))

Before:

After:

answered Nov 1, 2022 at 6:05

Arpan Saini

5,2611 gold badge49 silver badges56 bronze badges

Comments

Daniela F. Lopez Astorquiza · Accepted Answer · 2024-10-02 13:08:53Z

0

Multiple replace with .translate() function

# Original string
original_string = "12#4526&18##0"

translation_dict = {'&': r'\&',
                    '#': r'\#'}

# Create a translation table using the dictionary
translation_table = str.maketrans(translation_dict)

# Translate the string
modified_string = original_string.translate(translation_table)

print(modified_string)  # Output: 12\#4526\&18\#\#0

answered Oct 2, 2024 at 13:08

Daniela F. Lopez Astorquiza

11 bronze badge

1 Comment

d-stroyer Over a year ago

This answer has already been proposed, it looks as a duplicate.

Collectives™ on Stack Overflow

Best way to replace multiple characters in a string?

17 Answers 17

Replacing two characters

Replacing 17 characters

Addendum

12 Comments

9 Comments

7 Comments

Comments

Comments

Comments

4 Comments

Comments

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

17 Answers 17

Replacing two characters

Replacing 17 characters

Addendum

12 Comments

9 Comments

7 Comments

Comments

Comments

Comments

4 Comments

Comments

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related