Remove characters except digits from string using Python?

Question

How can I remove all characters except numbers from string?

@JG: I have gtk.Entry() and i want multiply float entered into it. — Jan Tojnar
– Jan Tojnar, Commented Oct 3, 2009 at 5:38
@JanTojnar use re.sub method as per answer two and explicitly list which chars to keep e.g. re.sub("[^0123456789\.]","","poo123.4and5fish") — Roger Heathcote
– Roger Heathcote, Commented Dec 30, 2012 at 16:26
If you only want to check if the string is all digits, see stackoverflow.com/questions/1323364. — Karl Knechtel
– Karl Knechtel, Commented Aug 1, 2022 at 20:12

Tim Tisdall · Accepted Answer · 2023-07-04 18:47:18Z

300

Use re.sub, like so:

>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'

\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

Or you can use filter, like so (in Python 2):

>>> filter(str.isdigit, 'aas30dsa20')
'3020'

Since in Python 3, filter returns an iterator instead of a list, you can use the following instead:

>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'

edited Jul 4, 2023 at 18:47

Tim Tisdall

10.5k3 gold badges60 silver badges89 bronze badges

answered Sep 20, 2009 at 12:18

João Silva

91.8k29 gold badges156 silver badges158 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

f0b0s Over a year ago

re is evil in such simple task, second one is the best I think, cause 'is...' methods are the fastest for strings.

SilentGhost Over a year ago

your filter example is limited to py2k

SilentGhost Over a year ago

@f0b0s-iu9-info: did you timed it? on my machine (py3k) re is twice as fast than filter with isdigit, generator with isdigt is halfway between them

asmaier Over a year ago

For Python 3.6 it should be re.sub("\\D", "", "aas30dsa20") . Otherwise one gets a DeprecationWarning: invalid escape sequence \D .

Mitch McMabers Over a year ago

@asmaier Simply use r for raw string: re.sub(r"\D+", "", "aas30dsa20")

|

rescdsk · Accepted Answer · 2014-10-22 21:17:22Z

119

In Python 2.*, by far the fastest approach is the .translate method:

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>>

string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument -- the key part.

.translate works very differently on Unicode strings (and strings in Python 3 -- I do wish questions specified which major-release of Python is of interest!) -- not quite this simple, not quite this fast, though still quite usable.

Back to 2.*, the performance difference is impressive...:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach...:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.

In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here's a convenient way to express this for deletion of "everything but" a few characters:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

also emits '1233344554552'. However, putting this in xx.py we have...:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

...which shows the performance advantage disappears, for this kind of "deletion" tasks, and becomes a performance decrease.

edited Oct 22, 2014 at 21:17

rescdsk

8,9254 gold badges39 silver badges32 bronze badges

answered Sep 20, 2009 at 16:37

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

10 Comments

Alex Martelli Over a year ago

@sunqiang, yes, absolutely -- there's a reason Py3k has gone to Unicode as THE text string type, instead of byte strings as in Py2 -- same reason Java and C# have always had the same "string means unicode" meme... some overhead, maybe, but MUCH better support for just about anything but English!-).

Tom Dalling Over a year ago

x.translate(None, string.digits) actually results in 'aaabbbbbb', which is the opposite of what is intended.

Chris Johnson Over a year ago

Echoing comments from Tom Dalling, your first example keeps all the undesirable characters -- does the opposite of what you said.

Nick T Over a year ago

@RyanB.Lynch et al, the fault was with a later editor and two other users that approved said edit, which, in fact, is totally wrong. Reverted.

Andy Hayden Over a year ago

overriding all builtin... not sure about that!

|

f0b0s · Accepted Answer · 2009-09-20 12:24:18Z

81

s=''.join(i for i in s if i.isdigit())

Another generator variant.

answered Sep 20, 2009 at 12:24

f0b0s

3,13727 silver badges32 bronze badges

5 Comments

Barath Ravikumar Over a year ago

Killed it..+1 Would have been even better if lamda was used

Eugene Chabanov Over a year ago

If you want to include any custom characters, for example include negatives or decimals - do this: s = ''.join(i for i in s if i.isdigit() or i in '-./\\')

Igor Atsberger Over a year ago

Fantastic solution without any imports

George Hayward Over a year ago

Just love this b/c it requires no imports!!

Kedar Joshi Over a year ago

I would say this is the best solution so far !

freiksenet · Accepted Answer · 2009-09-20 17:15:48Z

19

You can use filter:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

On python3.0 you have to join this (kinda ugly :( )

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

edited Sep 20, 2009 at 17:15

answered Sep 20, 2009 at 12:24

freiksenet

3,6793 gold badges32 silver badges28 bronze badges

2 Comments

SilentGhost Over a year ago

only in py2k, in py3k it returns a generator

Luiz C. Over a year ago

convert str to list to make sure it works on both py2 and py3: ''.join(filter(lambda x: x.isdigit(), list("dasdasd2313dsa")))

Aminah Nuraini · Accepted Answer · 2016-08-30 19:03:44Z

16

You can easily do it using Regex

>>> import re
>>> re.sub("\D","","£70,000")
70000

answered Aug 30, 2016 at 19:03

Aminah Nuraini

19.4k9 gold badges98 silver badges113 bronze badges

2 Comments

Iorek Over a year ago

By far the easiest way

jww Over a year ago

How is this different than João Silva's answer, which was provided 7 years earlier?

SilentGhost · Accepted Answer · 2009-09-20 12:23:17Z

14

along the lines of bayer's answer:

''.join(i for i in s if i.isdigit())

answered Sep 20, 2009 at 12:23

SilentGhost

322k67 gold badges312 silver badges294 bronze badges

1 Comment

Oli Over a year ago

No, this won't work for negative numbers because - is not a digit.

Roger Heathcote · Accepted Answer · 2012-12-30 16:31:51Z

11

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

answered Dec 30, 2012 at 16:31

Roger Heathcote

3,6054 gold badges39 silver badges48 bronze badges

2 Comments

Jan Tojnar Over a year ago

What about "poo123.4and.5fish"?

Roger Heathcote Over a year ago

In my code I check for the number of periods in the input string and raise an error if that is more than 1.

dboy · Accepted Answer · 2021-04-22 13:09:05Z

7

Try:

import re

string = '1abcd2XYZ3'
string_without_letters = re.sub(r'[a-z]', '', string.lower())

this should give:

edited Apr 22, 2021 at 13:09

dboy

1,0562 gold badges16 silver badges28 bronze badges

answered Dec 15, 2020 at 18:45

João

4101 gold badge5 silver badges11 bronze badges

3 Comments

Muneeb Ahmad Khurram Over a year ago

so [a-z] means all lowercase letters or for uppercase we have to [A-Z]?

João Over a year ago

[a-z] will work for both lower and uppercases :)

Muneeb Ahmad Khurram Over a year ago

yes, because I just noticed the string.lower() is your best friend.

Gilles 'SO- stop being evil' · Accepted Answer · 2013-03-04 13:26:03Z

6

x.translate(None, string.digits)

will delete all digits from string. To delete letters and keep the digits, do this:

x.translate(None, string.letters)

edited Mar 4, 2013 at 13:26

Gilles 'SO- stop being evil'

109k38 gold badges217 silver badges262 bronze badges

answered Mar 4, 2013 at 13:00

Terje Molnes

691 silver badge1 bronze badge

2 Comments

Bobort Over a year ago

I get a TypeError: translate() takes exactly one argument (2 given). Why this question was upvoted in its current state is quite frustrating.

ZaxR Over a year ago

translate changed from python 2 to 3. The syntax using this method in python 3 is x.translate(str.maketrans('', '', string.digits)) and x.translate(str.maketrans('', '', string.ascii_letters)) . Neither of these strips white space. I wouldn't really recommend this approach anymore...

bayer · Accepted Answer · 2009-09-20 12:21:49Z

5

Use a generator expression:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

answered Sep 20, 2009 at 12:21

bayer

6,89227 silver badges35 bronze badges

2 Comments

shxfee Over a year ago

instead do ''.join(n for n in foo if n.isdigit())

Anselmo Blanco Dominguez Over a year ago

With a small modification, "".join([i for i in s if i in "0123456789"]) , bayer's solution is faster than using "isdigit". It performs in 15% less time. Of all the solutions presented on this page, the quickest is @rescdsk 's. However, when it is not a loop, it is better to stick with the quickest "one line" solution.

rescdsk · Accepted Answer · 2014-10-22 21:09:42Z

A fast version for Python 3:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

Here's a performance comparison vs. regex:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

So it's a little bit more than 3 times faster than regex, for me. It's also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here's that version on my same system, for comparison.

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

Gant · Accepted Answer · 2009-09-20 12:23:03Z

2

Ugly but works:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

answered Sep 20, 2009 at 12:23

Gant

29.9k6 gold badges49 silver badges67 bronze badges

2 Comments

Gant Over a year ago

@SilentGhost it's my misunderstanding. had it corrected thanks :)

Bobort Over a year ago

Actually, with this method, I don't think you need to use "join." filter(lambda x: x.isdigit(), s) worked fine for me. ...oh, it's because I'm using Python 2.7.

AnilReddy · Accepted Answer · 2018-07-16 20:32:11Z

2

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.48 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 2.02 usec per loop

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.37 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 1.97 usec per loop

I had observed that join is faster than sub.

edited Jul 16, 2018 at 20:32

answered Jul 16, 2018 at 19:21

AnilReddy

2224 silver badges14 bronze badges

3 Comments

Jan Tojnar Over a year ago

Why are you repeating the two methods twice? And could you describe how is your answer different from the accepted one?

AnilReddy Over a year ago

Both results the same output. But, I just wanna show that join is faster the sub method in the results.

Jan Tojnar Over a year ago

They do not, your code does the opposite. And also you have four measurements but only two methods.

alfredo · Accepted Answer · 2019-05-17 21:12:59Z

2

You can read each character. If it is digit, then include it in the answer. The str.isdigit() method is a way to know if a character is digit.

your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'

edited May 17, 2019 at 21:12

answered May 17, 2019 at 20:54

alfredo

5326 silver badges9 bronze badges

1 Comment

chevybow Over a year ago

how is this different from the answer by f0b0s? You should edit that answer instead if you have more information to bring

Faisal Fida · Accepted Answer · 2022-09-02 03:40:23Z

2

You can use join + filter + lambda:

''.join(filter(lambda s: s.isdigit(), "20 years ago, 2 months ago, 2 days ago"))

Output: '2022'

answered Sep 2, 2022 at 3:40

Faisal Fida

336 bronze badges

Comments

Josh · Accepted Answer · 2018-01-24 11:03:05Z

0

Not a one liner but very simple:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )

answered Jan 24, 2018 at 11:03

Josh

1

Comments

chb · Accepted Answer · 2019-05-18 18:06:51Z

0

I used this. 'letters' should contain all the letters that you want to get rid of:

Output = Input.translate({ord(i): None for i in 'letters'}))

Example:

Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)

Output: 20

edited May 18, 2019 at 18:06

chb

2,0207 gold badges32 silver badges54 bronze badges

answered May 18, 2019 at 17:26

Gustav

11 bronze badge

Comments

Kokul Jose · Accepted Answer · 2020-10-18 22:58:34Z

0

my_string="sdfsdfsdfsfsdf353dsg345435sdfs525436654.dgg(" 
my_string=''.join((ch if ch in '0123456789' else '') for ch in my_string)
print(output:+my_string)

output: 353345435525436654

answered Oct 18, 2020 at 22:58

Kokul Jose

1,7822 gold badges18 silver badges28 bronze badges

1 Comment

Muneeb Ahmad Khurram Over a year ago

add this, as well for decimal point numbers, if ch in '0123456789.' else '' so that a . is also added.

David · Accepted Answer · 2022-09-30 11:00:06Z

0

Another one:

import re

re.sub('[^0-9]', '', 'ABC123 456')

Result:

'123456'

answered Sep 30, 2022 at 11:00

David

3,15236 silver badges18 bronze badges

Collectives™ on Stack Overflow

Remove characters except digits from string using Python?

19 Answers 19

7 Comments

10 Comments

5 Comments

2 Comments

2 Comments

1 Comment

2 Comments

3 Comments

2 Comments

2 Comments

Comments

2 Comments

3 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

7 Comments

10 Comments

5 Comments

2 Comments

2 Comments

1 Comment

2 Comments

3 Comments

2 Comments

2 Comments

Comments

2 Comments

3 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related