Removing a list of characters in string

Question

I want to remove characters in a string in python:

string.replace(',', '').replace("!", '').replace(":", '').replace(";", '')...

But I have many characters I have to remove. I thought about a list

list = [',', '!', '.', ';'...]

But how can I use the list to replace the characters in the string?

See stackoverflow.com/questions/1919096/… for various solutions and a nice comparison. — Martijn de Milliano
– Martijn de Milliano, Commented Apr 4, 2012 at 18:32
It's a pity that Python (which is said to come with batteries included) does not handle this use case out of the box. PHP's function str_replace does it - you can pass an array as the first argument and a string as the second (php.net/manual/pl/function.str-replace.php ). — JustAC0der
– JustAC0der, Commented Jan 13, 2017 at 23:09

georg · Accepted Answer · 2015-08-27 17:09:10Z

If you're using python2 and your inputs are strings (not unicodes), the absolutely best method is str.translate:

>>> chars_to_remove = ['.', '!', '?']
>>> subj = 'A.B!C?'
>>> subj.translate(None, ''.join(chars_to_remove))
'ABC'

Otherwise, there are following options to consider:

A. Iterate the subject char by char, omit unwanted characters and join the resulting list:

>>> sc = set(chars_to_remove)
>>> ''.join([c for c in subj if c not in sc])
'ABC'

(Note that the generator version ''.join(c for c ...) will be less efficient).

B. Create a regular expression on the fly and re.sub with an empty string:

>>> import re
>>> rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
>>> re.sub(rx, '', subj)
'ABC'

(re.escape ensures that characters like ^ or ] won't break the regular expression).

C. Use the mapping variant of translate:

>>> chars_to_remove = [u'δ', u'Γ', u'ж']
>>> subj = u'AжBδCΓ'
>>> dd = {ord(c):None for c in chars_to_remove}
>>> subj.translate(dd)
u'ABC'

Full testing code and timings:

#coding=utf8

import re

def remove_chars_iter(subj, chars):
    sc = set(chars)
    return ''.join([c for c in subj if c not in sc])

def remove_chars_re(subj, chars):
    return re.sub('[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_re_unicode(subj, chars):
    return re.sub(u'(?u)[' + re.escape(''.join(chars)) + ']', '', subj)

def remove_chars_translate_bytes(subj, chars):
    return subj.translate(None, ''.join(chars))

def remove_chars_translate_unicode(subj, chars):
    d = {ord(c):None for c in chars}
    return subj.translate(d)

import timeit, sys

def profile(f):
    assert f(subj, chars_to_remove) == test
    t = timeit.timeit(lambda: f(subj, chars_to_remove), number=1000)
    print ('{0:.3f} {1}'.format(t, f.__name__))

print (sys.version)
PYTHON2 = sys.version_info[0] == 2

print ('\n"plain" string:\n')

chars_to_remove = ['.', '!', '?']
subj = 'A.B!C?' * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)
profile(remove_chars_re)

if PYTHON2:
    profile(remove_chars_translate_bytes)
else:
    profile(remove_chars_translate_unicode)

print ('\nunicode string:\n')

if PYTHON2:
    chars_to_remove = [u'δ', u'Γ', u'ж']
    subj = u'AжBδCΓ'
else:
    chars_to_remove = ['δ', 'Γ', 'ж']
    subj = 'AжBδCΓ'

subj = subj * 1000
test = 'ABC' * 1000

profile(remove_chars_iter)

if PYTHON2:
    profile(remove_chars_re_unicode)
else:
    profile(remove_chars_re)

profile(remove_chars_translate_unicode)

Results:

2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]

"plain" string:

0.637 remove_chars_iter
0.649 remove_chars_re
0.010 remove_chars_translate_bytes

unicode string:

0.866 remove_chars_iter
0.680 remove_chars_re_unicode
1.373 remove_chars_translate_unicode

---

3.4.2 (v3.4.2:ab2c023a9432, Oct  5 2014, 20:42:22) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

"plain" string:

0.512 remove_chars_iter
0.574 remove_chars_re
0.765 remove_chars_translate_unicode

unicode string:

0.817 remove_chars_iter
0.686 remove_chars_re
0.876 remove_chars_translate_unicode

(As a side note, the figure for remove_chars_translate_bytes might give us a clue why the industry was reluctant to adopt Unicode for such a long time).

The second method raises an error TypeError: translate() takes exactly one argument (2 given). Apparently it takes dict as an argument.
@antonavy - the 2nd solution does work - but only of the string is not unicode (for which a different translate() is needed)

Noah · Accepted Answer · 2013-12-05 22:04:47Z

121

You can use str.translate():

s.translate(None, ",!.;")

Example:

>>> s = "asjo,fdjk;djaso,oio!kod.kjods;dkps"
>>> s.translate(None, ",!.;")
'asjofdjkdjasooiokodkjodsdkps'

edited Dec 5, 2013 at 22:04

Noah

22.9k8 gold badges67 silver badges76 bronze badges

answered Apr 4, 2012 at 18:31

Sven Marnach

608k123 gold badges966 silver badges865 bronze badges

6 Comments

Sven Marnach Over a year ago

@thg435: Nobody asked for this, but anyway: s.translate(dict.fromkeys(map(ord, u",!.;")))

hobs Over a year ago

This (and @PraveenGollakota's) simultaneous answer is exactly what @Laura asked for and should be the preferred answer(s).

Gank Over a year ago

why python3:TypeError: translate() takes exactly one argument (2 given)

Sven Marnach Over a year ago

@Gank: The unicode.translate() method has different parameters than the str.translate() method. Use the variant in the above comment for Unicode objects.

Jun Over a year ago

@SvenMarnach what is map(ord, u",!.;"))? and does u stand for unicode?

|

Dekel · Accepted Answer · 2016-09-11 12:33:47Z

47

If you are using python3 and looking for the translate solution - the function was changed and now takes 1 parameter instead of 2.

That parameter is a table (can be dictionary) where each key is the Unicode ordinal (int) of the character to find and the value is the replacement (can be either a Unicode ordinal or a string to map the key to).

Here is a usage example:

>>> list = [',', '!', '.', ';']
>>> s = "This is, my! str,ing."
>>> s.translate({ord(x): '' for x in list})
'This is my string'

answered Sep 11, 2016 at 12:33

Dekel

62.9k12 gold badges109 silver badges130 bronze badges

Comments

Praveen Gollakota · Accepted Answer · 2012-04-04 18:31:32Z

37

You can use the translate method.

s.translate(None, '!.;,')

answered Apr 4, 2012 at 18:31

Praveen Gollakota

39.4k11 gold badges64 silver badges62 bronze badges

Comments

ninjagecko · Accepted Answer · 2012-04-04 18:53:30Z

18

''.join(c for c in myString if not c in badTokens)

answered Apr 4, 2012 at 18:53

ninjagecko

91.5k24 gold badges143 silver badges153 bronze badges

1 Comment

Wolf Over a year ago

Useful in similar cases not based on chars and strings +1

aIKid · Accepted Answer · 2013-10-23 07:47:04Z

12

Why not a simple loop?

for i in replace_list:
    string = string.replace(i, '')

Also, avoid naming lists 'list'. It overrides the built-in function list.

answered Oct 23, 2013 at 7:47

aIKid

28.5k5 gold badges41 silver badges65 bronze badges

Comments

alan · Accepted Answer · 2012-04-04 18:58:50Z

9

Another approach using regex:

''.join(re.split(r'[.;!?,]', s))

edited Apr 4, 2012 at 18:58

answered Apr 4, 2012 at 18:51

alan

4,89224 silver badges32 bronze badges

Comments

krystan honour · Accepted Answer · 2021-06-01 00:24:54Z

6

you could use something like this

def replace_all(text, dic):
  for i, j in dic.iteritems():
    text = text.replace(i, j)
  return text

This code is not my own and comes from here its a great article and dicusses in depth doing this

edited Jun 1, 2021 at 0:24

answered Apr 4, 2012 at 18:31

krystan honour

6,8733 gold badges40 silver badges66 bronze badges

Comments

Dhia · Accepted Answer · 2016-12-22 21:40:55Z

5

simple way,

import re
str = 'this is string !    >><< (foo---> bar) @-tuna-#   sandwich-%-is-$-* good'

// condense multiple empty spaces into 1
str = ' '.join(str.split()

// replace empty space with dash
str = str.replace(" ","-")

// take out any char that matches regex
str = re.sub('[!@#$%^&*()_+<>]', '', str)

output:

this-is-string--foo----bar--tuna---sandwich--is---good

edited Dec 22, 2016 at 21:40

Dhia

10.7k11 gold badges62 silver badges70 bronze badges

answered Dec 22, 2016 at 19:55

perfecto25

86211 silver badges13 bronze badges

Comments

Biplob Das · Accepted Answer · 2020-05-26 08:01:40Z

5

Remove *%,&@! from below string:

s = "this is my string,  and i will * remove * these ** %% "
new_string = s.translate(s.maketrans('','','*%,&@!'))
print(new_string)

# output: this is my string  and i will  remove  these

answered May 26, 2020 at 8:01

Biplob Das

3,17426 silver badges15 bronze badges

1 Comment

Aybid Over a year ago

Explanation: 1. maketrans(x,y,z): the third parameter is for replace. x,y are '' here, so makes no change. Only characters with z are removed 2. translate(): returns a string where each character is mapped to its corresponding character in the translation table (here from the maketrans fn)

Linlin林林 · Accepted Answer · 2021-12-03 10:05:15Z

5

In Python 3.8, this works for me:

s.translate(s.maketrans(dict.fromkeys(',!.;', '')))

edited Dec 3, 2021 at 10:05

answered Dec 3, 2021 at 9:52

Linlin林林

511 silver badge4 bronze badges

2 Comments

confiq Over a year ago

str.translate() is from python2 and there is no need to use maketrans() func.

2e0byo Over a year ago

@config on the contrary: python 3 has a str.translate() which takes char ords, and thus does require something like str.maketrans() (or ee.g. a dict comp calling ord as in the other python3 answer).. This answer won't work without the maketrans() (try it).

Community · Accepted Answer · 2017-05-23 11:33:24Z

4

Also an interesting topic on removal UTF-8 accent form a string converting char to their standard non-accentuated char:

What is the best way to remove accents in a python unicode string?

code extract from the topic:

import unicodedata

def remove_accents(input_str):
    nkfd_form = unicodedata.normalize('NFKD', input_str)
    return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])

edited May 23, 2017 at 11:33

CommunityBot

11 silver badge

answered May 31, 2014 at 7:23

Sylvain

4134 silver badges10 bronze badges

Comments

rioted · Accepted Answer · 2015-03-24 17:20:34Z

4

Perhaps a more modern and functional way to achieve what you wish:

>>> subj = 'A.B!C?'
>>> list = set([',', '!', '.', ';', '?'])
>>> filter(lambda x: x not in list, subj)
'ABC'

please note that for this particular purpose it's quite an overkill, but once you need more complex conditions, filter comes handy

answered Mar 24, 2015 at 17:20

rioted

1,10213 silver badges24 bronze badges

1 Comment

rioted Over a year ago

Also note that this can just as easily be done with list comprehensions, which is way more pythonic in my opinion.

Akshay Hazari · Accepted Answer · 2015-11-03 04:13:28Z

2

How about this - a one liner.

reduce(lambda x,y : x.replace(y,"") ,[',', '!', '.', ';'],";Test , ,  !Stri!ng ..")

answered Nov 3, 2015 at 4:13

Akshay Hazari

3,2874 gold badges56 silver badges98 bronze badges

Comments

Hiskel Kelemework · Accepted Answer · 2016-11-13 15:45:53Z

i think this is simple enough and will do!

list = [",",",","!",";",":"] #the list goes on.....

theString = "dlkaj;lkdjf'adklfaj;lsd'fa'dfj;alkdjf" #is an example string;
newString="" #the unwanted character free string
for i in range(len(TheString)):
    if theString[i] in list:
        newString += "" #concatenate an empty string.
    else:
        newString += theString[i]

this is one way to do it. But if you are tired of keeping a list of characters that you want to remove, you can actually do it by using the order number of the strings you iterate through. the order number is the ascii value of that character. the ascii number for 0 as a char is 48 and the ascii number for lower case z is 122 so:

theString = "lkdsjf;alkd8a'asdjf;lkaheoialkdjf;ad"
newString = ""
for i in range(len(theString)):
     if ord(theString[i]) < 48 or ord(theString[i]) > 122: #ord() => ascii num.
         newString += ""
     else:
        newString += theString[i]

tcpiper · Accepted Answer · 2013-10-28 14:23:44Z

1

These days I am diving into scheme, and now I think am good at recursing and eval. HAHAHA. Just share some new ways:

first ,eval it

print eval('string%s' % (''.join(['.replace("%s","")'%i for i in replace_list])))

second , recurse it

def repn(string,replace_list):
    if replace_list==[]:
        return string
    else:
        return repn(string.replace(replace_list.pop(),""),replace_list)

print repn(string,replace_list)

Hey ,don't downvote. I am just want to share some new idea.

answered Oct 28, 2013 at 14:23

tcpiper

2,5642 gold badges31 silver badges46 bronze badges

Comments

Sheikh Ahmad Shah · Accepted Answer · 2015-05-26 04:07:53Z

1

I am thinking about a solution for this. First I would make the string input as a list. Then I would replace the items of list. Then through using join command, I will return list as a string. The code can be like this:

def the_replacer(text):
    test = []    
    for m in range(len(text)):
        test.append(text[m])
        if test[m]==','\
        or test[m]=='!'\
        or test[m]=='.'\
        or test[m]=='\''\
        or test[m]==';':
    #....
            test[n]=''
    return ''.join(test)

This would remove anything from the string. What do you think about that?

edited May 26, 2015 at 4:07

answered May 26, 2015 at 4:00

Sheikh Ahmad Shah

2811 gold badge4 silver badges12 bronze badges

Comments

pylang · Accepted Answer · 2018-02-09 01:31:07Z

1

Here is a more_itertools approach:

import more_itertools as mit


s = "A.B!C?D_E@F#"
blacklist = ".!?_@#"

"".join(mit.flatten(mit.split_at(s, pred=lambda x: x in set(blacklist))))
# 'ABCDEF'

Here we split upon items found in the blacklist, flatten the results and join the string.

answered Feb 9, 2018 at 1:31

pylang

45.3k16 gold badges137 silver badges133 bronze badges

Comments

John Forbes · Accepted Answer · 2020-05-08 03:38:41Z

1

Python 3, single line list comprehension implementation.

from string import ascii_lowercase # 'abcdefghijklmnopqrstuvwxyz'
def remove_chars(input_string, removable):
  return ''.join([_ for _ in input_string if _ not in removable])

print(remove_chars(input_string="Stack Overflow", removable=ascii_lowercase))
>>> 'S O'

answered May 8, 2020 at 3:38

John Forbes

1,37416 silver badges20 bronze badges

Comments

Shaida Muhammad · Accepted Answer · 2022-02-02 15:53:58Z

1

Why not utilize this simple function:

def remove_characters(str, chars_list):
    for char in chars_list:
        str = str.replace(char, '')
  
    return str

Use function:

print(remove_characters('A.B!C?', ['.', '!', '?']))

Output:

ABC

answered Feb 2, 2022 at 15:53

Shaida Muhammad

1,71020 silver badges30 bronze badges

Collectives™ on Stack Overflow

Removing a list of characters in string

20 Answers 20

2 Comments

6 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

20 Answers 20

2 Comments

6 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

2 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related