How should I write this regex in python

Question

I have the string.

st = "12345 hai how  r u @3456? Awer12345 7890"
re.findall('([0-9]+)',st)

It should not come like :

['12345', '3456', '12345', '7890']

I should get

['12345','7890']

I should only take the numeric values

and

It should not contain any other chars like alphabets,special chars

Corey Farwell · Accepted Answer · 2012-02-09 18:17:11Z

11

No need to use a regular expression:

[i for i in st.split(" ") if i.isdigit()]

Which I think is much more readable than using a regex

answered Feb 9, 2012 at 18:17

Corey Farwell

1,9463 gold badges14 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrew Clark · Accepted Answer · 2012-02-09 18:46:21Z

3

Corey's solution is really the right way to go here, but since the question did ask for regex, here is a regex solution that I think is simpler than the others:

re.findall(r'(?<!\S)\d+(?!\S)', st)

And an explanation:

(?<!\S)   # Fail if the previous character (if one exists) isn't whitespace
\d+       # Match one or more digits
(?!\S)    # Fail if the next character (if one exists) isn't whitespace

Some examples:

>>> re.findall(r'(?<!\S)\d+(?!\S)', '12345 hai how  r u @3456? Awer12345 7890')
['12345', '7890']
>>> re.findall(r'(?<!\S)\d+(?!\S)', '12345 hai how r u @3456? Awer12345 7890123ER%345 234 456 789')
['12345', '234', '456', '789']

answered Feb 9, 2012 at 18:46

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

Comments

NPE · Accepted Answer · 2012-02-09 18:21:20Z

2

In [21]: re.findall(r'(?:^|\s)(\d+)(?=$|\s)', st)
Out[21]: ['12345', '7890']

Here,

(?:^|\s) is a non-capture group that matches the start of the string, or a space.
(\d+) is a capture group that matches one or more digits.
(?=$|\s) is lookahead assertion that matches the end of the string, or a space, without consuming it.

edited Feb 9, 2012 at 18:21

answered Feb 9, 2012 at 18:16

NPE

503k114 gold badges970 silver badges1k bronze badges

5 Comments

Jan Pöschko Over a year ago

same problem as with @Ademiban's solution: This will not find '456' in '123 456' because the inner space is "consumed" by the first match.

NPE Over a year ago

@JanPöschko: Good catch, thanks. Fixed (by converting the final group into a lookahead assertion).

Nava Over a year ago

@JanPöschko Thanks. It is not working for this "12345 hai how r u @3456? Awer12345 7890123ER%345 234 456 789"

NPE Over a year ago

@saravana: The updated version gives ['12345', '234', '456', '789']. Is this not what you expect?

Nava Over a year ago

@aix This result is expected. Thanks.

shift66 · Accepted Answer · 2012-02-09 18:25:19Z

2

use this: (^|\s)[0-9]+(\s|$) pattern. (^|\s) means that your number must be at the start of the string or there must be a whitespace character before the number. And (\s|$) means that there must be a whitespace after number or the number is at the end of the string.
As said Jan Pöschko, 456 won't be found in 123 456. If your "bad" parts (@, Awer) are always prefixes, you can use this (^|\s)[0-9]+ pattern and everything will be OK. It will match all numbers, which have only whitespaces before or are at the start of the string. Hope this helped...

edited Feb 9, 2012 at 18:25

answered Feb 9, 2012 at 18:15

shift66

12k13 gold badges54 silver badges83 bronze badges

1 Comment

Jan Pöschko Over a year ago

This will not find '456' in '123 456' because the inner space is "consumed" by the first match.

ColoradoEric · Accepted Answer · 2012-02-09 18:23:55Z

0

Your expression finds all sequences of digits, regardless of what surrounds them. You need to include a specification of what comes before and after the sequence to get the behavior you want:

re.findall(r"[\D\b](\d+)[\D\b]", st)

will do what you want. In English, it says "match all sequences of one or more digits that are surrounded by a non-digit character.or a word boundary"

answered Feb 9, 2012 at 18:23

ColoradoEric

714 bronze badges

1 Comment

DSM Over a year ago

This returns ['3456', '12345'] when he wants ['12345','7890']. I'm ever more convinced of my view that regexes are more trouble than they're worth unless they're trivial.

Collectives™ on Stack Overflow

How should I write this regex in python

5 Answers 5

Comments

Comments

5 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

5 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related