4

I have the following problem:

var a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123

I'd like a regex to extract only the numbers: 15159970, 15615115, 11224455, 55441123

What a have so far:

re.findall(r'(\d+\s)\(', a)

which only extracts the first 2 numbers: 15159970, 15615115

Having also a second var b = 15159970, 15615115, 11224455, 55441126 I would like to compare the 2 vars and if they differ then a print("vars are different!")

Thanks!

5
  • The problem is your opening parenthesis in your expression. The pattern occurs only twice. Commented Nov 25, 2019 at 9:53
  • 2
    you can use the following regex to look for digits which are not preceeded by a . or a ( regex: (?<![(.])\b(\d+)\b Commented Nov 25, 2019 at 9:55
  • I was going with (\d+)(?:\s+|,|\n|$) wich match only pattern followed by : a newline, a space, a comma or end of line. But I think that what @ChrisDoyle suggest is better. Commented Nov 25, 2019 at 9:59
  • Does this answer your question? Regex to match text, but not if contained in brackets Commented Nov 25, 2019 at 10:09
  • No, Regex to match text, but not if contained in brackets does not answer this question, the criteria are not just skipping all in brackets. Even if that is the only criterion, the answers in the linked thread are not actually working for all cases, my solution below does. Commented Nov 26, 2019 at 8:04

1 Answer 1

2

You may extract all chunks of digits not preceded with a digit or digit + dot and not followed with a dot + digit or a digit:

(?<!\d)(?<!\d\.)\d+(?!\.?\d)

See the regex demo

Details

  • (?<!\d) - a negative lookbehind that fails a location immediately preceded with a digit
  • (?<!\d\.) - a negative lookbehind that fails a location immediately preceded with a digit and a dot
  • \d+ - 1+ digits
  • (?!\.?\d) - a negative lookahead that fails a location immediately followed with a digit or a dot + a digit.

Python demo:

import re
a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 '
print( re.findall(r'(?<!\d)(?<!\d\.)\d+(?!\.?\d)', a) )
# => ['15159970', '15615115', '11224455', '55441123']

Another solution: only extract the digit chunks outside of parentheses.

See this Python demo:

import re
text = "15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 (28.11.2014 12:43:14)"
print( list(filter(None, re.findall(r'\([^()]+\)|(\d+)', text))) )
# => ['15159970', '15615115', '11224455', '55441123']

Here, \([^()]+\)|(\d+) matches

  • \([^()]+\) - (, any 1+ chars other than ( and ) and then )
  • | - or
  • (\d+) - matches and captures into Group 1 one or more digits (re.findall only includes captured substrings if there is a capturing group in the pattern).

Empty items appear in the result when the non-parenthesized match occurs, thus, we need to remove them (either with list(filter(None, results)) or with [x for x in results if x]).

Sign up to request clarification or add additional context in comments.

5 Comments

Can I also ditch the following parenthesis (28.11.2014 12:43:14) ? I would also like to compare 2 of these variables like, var a and another var b having b = 15167458 (25.05.2011 10:10:23), 15161211 (10.08.2012 12:15:22)
@cosmin Replace (?<!\d\.) with (?<!\d[.:]) and (?!\.?\d) with (?![.:]?\d). See this regex demo.
@cosmin I added a different solution that will also work for you and may be more flexible since you may add more context exceptions.
Could you also help me with the compare of the 2 lists? I mean if one or another had something different ( a digit maybe) then a print would tell the difference otherwise the 2 were equal.@wiktor
@cosmin That sounds like a new question already. Anyway, it is not clear what you need to do and what two lists you refer to.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.