0

I have the string.

st = "12345 hai how  r u @3456? Awer12345 7890"
re.findall('([0-9]+)',st)

It should not come like :

['12345', '3456', '12345', '7890']

I should get

['12345','7890']

I should only take the numeric values

and

It should not contain any other chars like alphabets,special chars

0

5 Answers 5

11

No need to use a regular expression:

[i for i in st.split(" ") if i.isdigit()]

Which I think is much more readable than using a regex

Sign up to request clarification or add additional context in comments.

Comments

3

Corey's solution is really the right way to go here, but since the question did ask for regex, here is a regex solution that I think is simpler than the others:

re.findall(r'(?<!\S)\d+(?!\S)', st)

And an explanation:

(?<!\S)   # Fail if the previous character (if one exists) isn't whitespace
\d+       # Match one or more digits
(?!\S)    # Fail if the next character (if one exists) isn't whitespace

Some examples:

>>> re.findall(r'(?<!\S)\d+(?!\S)', '12345 hai how  r u @3456? Awer12345 7890')
['12345', '7890']
>>> re.findall(r'(?<!\S)\d+(?!\S)', '12345 hai how r u @3456? Awer12345 7890123ER%345 234 456 789')
['12345', '234', '456', '789']

Comments

2
In [21]: re.findall(r'(?:^|\s)(\d+)(?=$|\s)', st)
Out[21]: ['12345', '7890']

Here,

  • (?:^|\s) is a non-capture group that matches the start of the string, or a space.
  • (\d+) is a capture group that matches one or more digits.
  • (?=$|\s) is lookahead assertion that matches the end of the string, or a space, without consuming it.

5 Comments

same problem as with @Ademiban's solution: This will not find '456' in '123 456' because the inner space is "consumed" by the first match.
@JanPöschko: Good catch, thanks. Fixed (by converting the final group into a lookahead assertion).
@JanPöschko Thanks. It is not working for this "12345 hai how r u @3456? Awer12345 7890123ER%345 234 456 789"
@saravana: The updated version gives ['12345', '234', '456', '789']. Is this not what you expect?
@aix This result is expected. Thanks.
2

use this: (^|\s)[0-9]+(\s|$) pattern. (^|\s) means that your number must be at the start of the string or there must be a whitespace character before the number. And (\s|$) means that there must be a whitespace after number or the number is at the end of the string.
As said Jan Pöschko, 456 won't be found in 123 456. If your "bad" parts (@, Awer) are always prefixes, you can use this (^|\s)[0-9]+ pattern and everything will be OK. It will match all numbers, which have only whitespaces before or are at the start of the string. Hope this helped...

1 Comment

This will not find '456' in '123 456' because the inner space is "consumed" by the first match.
0

Your expression finds all sequences of digits, regardless of what surrounds them. You need to include a specification of what comes before and after the sequence to get the behavior you want:

re.findall(r"[\D\b](\d+)[\D\b]", st)

will do what you want. In English, it says "match all sequences of one or more digits that are surrounded by a non-digit character.or a word boundary"

1 Comment

This returns ['3456', '12345'] when he wants ['12345','7890']. I'm ever more convinced of my view that regexes are more trouble than they're worth unless they're trivial.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.