1

I would like to extract only the numbers contained in a string. Can isdigit() and split() be combined for this purpose or there is simpler/faster way?

Example:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

Output:

numbers = [122, 35, 1052]
text = ['How to extract only number', 'The number must be extracted', 'must be extracted']

My code:

text = []
numbers = []
temp_numbers = []
for i in range(len(m)):
    text.append([word for word in m[i].split() if not word.isdigit()])
    temp_numbers.append([int(word) for word in m[i].split() if word.isdigit()])
for i in range(len(m)):
    text[i] = ' '.join(text[i])
for elem in temp_numbers:
    numbers.extend(elem)

print(text)
print(numbers)
2
  • You could omit ==True and ==False and factor out the common for word in m[i].split() if word.isdigit() but other than that this looks as simple as it can get. Commented Aug 29, 2022 at 16:03
  • This has been address here: stackoverflow.com/questions/19715303/… Commented Aug 29, 2022 at 16:07

3 Answers 3

2

Import regex library:

import re

If you want to extract all digits:

numbers = []
texts = []
for string in m:
    numbers.append(re.findall("\d+", string))
    texts.append(re.sub("\d+", "", string).strip())

If you want to extract only first digit:

numbers = []
texts = []
for string in m:
    numbers.append(re.findall("\d+", string)[0])
    texts.append(re.sub("\d+", "", string).strip())
Sign up to request clarification or add additional context in comments.

1 Comment

Why not use the same pattern twice?
1

So if we take m as a list you can just loop through it and check if the current char is a digit then if so append it.

For loop solution:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

numbers = []
temp_num = ""

for string in m:
    # Presuming m only contains strings

    for char in string:
        if char.isdigit():
            temp_num += char
    
    numbers.append(int(temp_num))
    temp_num = ""

List comprehension solution - appends each number at different indexes:

m = ['How to extract only number 122', 'The number 35 must be extracted', '1052 must be extracted']

numbers = [int(char) for string in m for char in string if char.isdigit()]

Hope this helped, also if you want to only get the values of an iterable (e.g. a list) just use for varname in iterable it's faster and cleaner.

If you need both index and the value, use for index, varname in enumerate(iterable).

Comments

0
nums_list = []
m = ["How to extract only number 122", "The number 35 must be extracted", "1052 must be extracted"]
for i in m:
    new_l = i.split(" ")
    for j in new_l:
        if j.isdigit():
            nums_list.append(int(j))
print nums_list

OP:

[122, 35, 1052]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.