210

How can I remove all characters except numbers from string?

4
  • @Jan Tojnar: Can you give an example ? Commented Sep 21, 2009 at 22:40
  • @JG: I have gtk.Entry() and i want multiply float entered into it. Commented Oct 3, 2009 at 5:38
  • 2
    @JanTojnar use re.sub method as per answer two and explicitly list which chars to keep e.g. re.sub("[^0123456789\.]","","poo123.4and5fish") Commented Dec 30, 2012 at 16:26
  • If you only want to check if the string is all digits, see stackoverflow.com/questions/1323364. Commented Aug 1, 2022 at 20:12

19 Answers 19

300

Use re.sub, like so:

>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'

\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

Or you can use filter, like so (in Python 2):

>>> filter(str.isdigit, 'aas30dsa20')
'3020'

Since in Python 3, filter returns an iterator instead of a list, you can use the following instead:

>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'
Sign up to request clarification or add additional context in comments.

7 Comments

re is evil in such simple task, second one is the best I think, cause 'is...' methods are the fastest for strings.
your filter example is limited to py2k
@f0b0s-iu9-info: did you timed it? on my machine (py3k) re is twice as fast than filter with isdigit, generator with isdigt is halfway between them
For Python 3.6 it should be re.sub("\\D", "", "aas30dsa20") . Otherwise one gets a DeprecationWarning: invalid escape sequence \D .
@asmaier Simply use r for raw string: re.sub(r"\D+", "", "aas30dsa20")
|
119

In Python 2.*, by far the fastest approach is the .translate method:

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>> 

string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument -- the key part.

.translate works very differently on Unicode strings (and strings in Python 3 -- I do wish questions specified which major-release of Python is of interest!) -- not quite this simple, not quite this fast, though still quite usable.

Back to 2.*, the performance difference is impressive...:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach...:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.

In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here's a convenient way to express this for deletion of "everything but" a few characters:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

also emits '1233344554552'. However, putting this in xx.py we have...:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

...which shows the performance advantage disappears, for this kind of "deletion" tasks, and becomes a performance decrease.

10 Comments

@sunqiang, yes, absolutely -- there's a reason Py3k has gone to Unicode as THE text string type, instead of byte strings as in Py2 -- same reason Java and C# have always had the same "string means unicode" meme... some overhead, maybe, but MUCH better support for just about anything but English!-).
x.translate(None, string.digits) actually results in 'aaabbbbbb', which is the opposite of what is intended.
Echoing comments from Tom Dalling, your first example keeps all the undesirable characters -- does the opposite of what you said.
@RyanB.Lynch et al, the fault was with a later editor and two other users that approved said edit, which, in fact, is totally wrong. Reverted.
overriding all builtin... not sure about that!
|
81
s=''.join(i for i in s if i.isdigit())

Another generator variant.

5 Comments

Killed it..+1 Would have been even better if lamda was used
If you want to include any custom characters, for example include negatives or decimals - do this: s = ''.join(i for i in s if i.isdigit() or i in '-./\\')
Fantastic solution without any imports
Just love this b/c it requires no imports!!
I would say this is the best solution so far !
19

You can use filter:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

On python3.0 you have to join this (kinda ugly :( )

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

2 Comments

only in py2k, in py3k it returns a generator
convert str to list to make sure it works on both py2 and py3: ''.join(filter(lambda x: x.isdigit(), list("dasdasd2313dsa")))
16

You can easily do it using Regex

>>> import re
>>> re.sub("\D","","£70,000")
70000

2 Comments

By far the easiest way
How is this different than João Silva's answer, which was provided 7 years earlier?
14

along the lines of bayer's answer:

''.join(i for i in s if i.isdigit())

1 Comment

No, this won't work for negative numbers because - is not a digit.
11

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

2 Comments

What about "poo123.4and.5fish"?
In my code I check for the number of periods in the input string and raise an error if that is more than 1.
7

Try:

import re

string = '1abcd2XYZ3'
string_without_letters = re.sub(r'[a-z]', '', string.lower())

this should give:

123

3 Comments

so [a-z] means all lowercase letters or for uppercase we have to [A-Z]?
[a-z] will work for both lower and uppercases :)
yes, because I just noticed the string.lower() is your best friend.
6
x.translate(None, string.digits)

will delete all digits from string. To delete letters and keep the digits, do this:

x.translate(None, string.letters)

2 Comments

I get a TypeError: translate() takes exactly one argument (2 given). Why this question was upvoted in its current state is quite frustrating.
translate changed from python 2 to 3. The syntax using this method in python 3 is x.translate(str.maketrans('', '', string.digits)) and x.translate(str.maketrans('', '', string.ascii_letters)) . Neither of these strips white space. I wouldn't really recommend this approach anymore...
5

Use a generator expression:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

2 Comments

instead do ''.join(n for n in foo if n.isdigit())
With a small modification, "".join([i for i in s if i in "0123456789"]) , bayer's solution is faster than using "isdigit". It performs in 15% less time. Of all the solutions presented on this page, the quickest is @rescdsk 's. However, when it is not a loop, it is better to stick with the quickest "one line" solution.
5

A fast version for Python 3:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

Here's a performance comparison vs. regex:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

So it's a little bit more than 3 times faster than regex, for me. It's also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here's that version on my same system, for comparison.

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

Comments

2

Ugly but works:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

2 Comments

@SilentGhost it's my misunderstanding. had it corrected thanks :)
Actually, with this method, I don't think you need to use "join." filter(lambda x: x.isdigit(), s) worked fine for me. ...oh, it's because I'm using Python 2.7.
2
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.48 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 2.02 usec per loop

$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'

100000 loops, best of 3: 2.37 usec per loop

$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'

100000 loops, best of 3: 1.97 usec per loop

I had observed that join is faster than sub.

3 Comments

Why are you repeating the two methods twice? And could you describe how is your answer different from the accepted one?
Both results the same output. But, I just wanna show that join is faster the sub method in the results.
They do not, your code does the opposite. And also you have four measurements but only two methods.
2

You can read each character. If it is digit, then include it in the answer. The str.isdigit() method is a way to know if a character is digit.

your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'

1 Comment

how is this different from the answer by f0b0s? You should edit that answer instead if you have more information to bring
2

You can use join + filter + lambda:

''.join(filter(lambda s: s.isdigit(), "20 years ago, 2 months ago, 2 days ago"))

Output: '2022'

Comments

0

Not a one liner but very simple:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )

Comments

0

I used this. 'letters' should contain all the letters that you want to get rid of:

Output = Input.translate({ord(i): None for i in 'letters'}))

Example:

Input = "I would like 20 dollars for that suit" Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'})) print(Output)

Output: 20

Comments

0
my_string="sdfsdfsdfsfsdf353dsg345435sdfs525436654.dgg(" 
my_string=''.join((ch if ch in '0123456789' else '') for ch in my_string)
print(output:+my_string)

output: 353345435525436654

1 Comment

add this, as well for decimal point numbers, if ch in '0123456789.' else '' so that a . is also added.
0

Another one:

import re

re.sub('[^0-9]', '', 'ABC123 456')

Result:

'123456'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.