Python REGEX How to extract particular numbers from variable

Question

I have the following problem:

var a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123

I'd like a regex to extract only the numbers: 15159970, 15615115, 11224455, 55441123

What a have so far:

re.findall(r'(\d+\s)\(', a)

which only extracts the first 2 numbers: 15159970, 15615115

Having also a second var b = 15159970, 15615115, 11224455, 55441126 I would like to compare the 2 vars and if they differ then a print("vars are different!")

Thanks!

The problem is your opening parenthesis in your expression. The pattern occurs only twice. — Jan
– Jan, Commented Nov 25, 2019 at 9:53
you can use the following regex to look for digits which are not preceeded by a . or a ( regex: (?<![(.])\b(\d+)\b — Chris Doyle
– Chris Doyle, Commented Nov 25, 2019 at 9:55
I was going with (\d+)(?:\s+|,|\n|$) wich match only pattern followed by : a newline, a space, a comma or end of line. But I think that what @ChrisDoyle suggest is better. — naurel
– naurel, Commented Nov 25, 2019 at 9:59
Does this answer your question? Regex to match text, but not if contained in brackets — naurel
– naurel, Commented Nov 25, 2019 at 10:09
No, Regex to match text, but not if contained in brackets does not answer this question, the criteria are not just skipping all in brackets. Even if that is the only criterion, the answers in the linked thread are not actually working for all cases, my solution below does. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 26, 2019 at 8:04

Wiktor Stribiżew · Accepted Answer · 2019-11-25 10:12:29Z

2

You may extract all chunks of digits not preceded with a digit or digit + dot and not followed with a dot + digit or a digit:

(?<!\d)(?<!\d\.)\d+(?!\.?\d)

See the regex demo

Details

(?<!\d) - a negative lookbehind that fails a location immediately preceded with a digit
(?<!\d\.) - a negative lookbehind that fails a location immediately preceded with a digit and a dot
\d+ - 1+ digits
(?!\.?\d) - a negative lookahead that fails a location immediately followed with a digit or a dot + a digit.

Python demo:

import re
a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 '
print( re.findall(r'(?<!\d)(?<!\d\.)\d+(?!\.?\d)', a) )
# => ['15159970', '15615115', '11224455', '55441123']

Another solution: only extract the digit chunks outside of parentheses.

See this Python demo:

import re
text = "15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 (28.11.2014 12:43:14)"
print( list(filter(None, re.findall(r'\([^()]+\)|(\d+)', text))) )
# => ['15159970', '15615115', '11224455', '55441123']

Here, $[^()]+$|(\d+) matches

$[^()]+$ - (, any 1+ chars other than ( and ) and then )
| - or
(\d+) - matches and captures into Group 1 one or more digits (re.findall only includes captured substrings if there is a capturing group in the pattern).

Empty items appear in the result when the non-parenthesized match occurs, thus, we need to remove them (either with list(filter(None, results)) or with [x for x in results if x]).

edited Nov 25, 2019 at 10:12

answered Nov 25, 2019 at 9:53

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

cosmin Over a year ago

Can I also ditch the following parenthesis (28.11.2014 12:43:14) ? I would also like to compare 2 of these variables like, var a and another var b having b = 15167458 (25.05.2011 10:10:23), 15161211 (10.08.2012 12:15:22)

Wiktor Stribiżew Over a year ago

@cosmin Replace (?<!\d\.) with (?<!\d[.:]) and (?!\.?\d) with (?![.:]?\d). See this regex demo.

Wiktor Stribiżew Over a year ago

@cosmin I added a different solution that will also work for you and may be more flexible since you may add more context exceptions.

cosmin Over a year ago

Could you also help me with the compare of the 2 lists? I mean if one or another had something different ( a digit maybe) then a print would tell the difference otherwise the 2 were equal.@wiktor

Wiktor Stribiżew Over a year ago

@cosmin That sounds like a new question already. Anyway, it is not clear what you need to do and what two lists you refer to.

Collectives™ on Stack Overflow

Python REGEX How to extract particular numbers from variable

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related