0

I have the following case:

Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2

I try to use regex to find only the integers:

  1. 2.000
  2. 2,000
  3. 2

but not the other float numbers.
I tried different things:

re.search('(?<![0-9.])2(?![.,]?[1-9])(?=[.,]*[0]*)(?![1-9]),...)

but this returns true for:

  1. 2.00001
  2. 2.000
  3. 2,000
  4. 2,0001
  5. 2

What have I to do?

UPDATE
I have updated the question and it should also find an integer without any comma and point, too (2).

6
  • Try (?<!\d)(?<!\d[.,])\d{1,3}(?:[.,]\d{3})*(?![,.]?\d), see demo. Commented Nov 10, 2022 at 14:25
  • @WiktorStribiżew it does not match all integers in test 2 2.00 Commented Nov 10, 2022 at 14:28
  • If you do NOT want to match 2.00001, why do you want to match 2.00? How can you formulate the pattern requirements regarding differentiation between valid and non-valid floats? Commented Nov 10, 2022 at 14:30
  • 1
    What about (?<!\d)(?<!\d[.,])(?:\d{1,3}(?:([.,])\d{3})*|\d{4,})(?:(?!\1)[.,]0+)?(?![,.]?\d)? See regex101.com/r/qrG8hg/2 Commented Nov 10, 2022 at 14:40
  • 1
    If you do not need to support thousand separators: (?<!\d)(?<!\d[.,])\d+(?:[.,]0+)?(?![,.]?\d) - see this demo. Commented Nov 10, 2022 at 14:45

4 Answers 4

1

I would use:

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000'

re.findall(r'(\d+[.,]0+)(?!\d)', text)

Output:

['2.000', '2,000']

Regex:

(        # start capturing
\d+      # match digit(s)
[.,]     # match . or ,
0+       # match one or more zeros
)        # stop capturing
(?!\d)   # ensure the last zero is not followed by a digit

regex demo

If you also want to match "intergers" alone, surrounded by spaces or parentheses/brackets:

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 2'

re.findall(r'(?:^|[(\s[])(\d+(?:[.,]0+(?!\d))?)(?=[]\s)]|$)', text)

Regex:

(?:^|[(\s[])      # match the start of string or [ or ( or space
(                 # start capturing
\d+               # match digit(s)
(?:[.,]0+(?!\d))? # optionally match . or , with only zeros
)                 # stop capturing
(?=[]\s)]|$)      # match the end of string or ] or ) or space

regex demo

Sign up to request clarification or add additional context in comments.

11 Comments

Ok, I have to update my question. It should of course also match 2 alone without , and ..
@CodePope OK, but how do you define a number? can you have cases like abc123 or 127.0.0.0?
No, such cases are not possible. Numbers appear either in brackets or have at least whitespace prior to them.
But it does not match all integers when the string is 2 2.00 .
You said there is "at least whitespace prior to them", I didn't include a start of string boundary, let me update ;)
|
1

You can use

re.findall(r'\b(?<!\d[.,])\d+(?:[.,]0+)?\b(?![,.]\d)', text)

See the regex demo. Details:

  • \b - a word boundary
  • (?<!\d[.,]) - no digit followed with . or , immediately on the left
  • \d+ - one or more digits
  • (?:[.,]0+)? - an optional sequence of . or , and then one or more zeros
  • \b - a word boundary
  • (?![,.]\d) - no , or . and a digit allowed immediately to the right.

If you need to support thousand separators:

pattern = r'\b(?<!\d[.,])(?:\d{1,3}(?:(?=([.,]))(?:\1\d{3})+)?|\d{4,})(?:(?!\1)[.,]0+)?\b(?![,.]\d)'
matches = [x.group() for x in re.finditer(pattern, text)]

See this regex demo.

5 Comments

There is one problem with that and I know I haven't given this sample, but if a have a string like GB2L 114, it will also return the 2 inside the string.
@CodePope Then you need a different type of boundaries: (?<!\d[.,])\b\d+(?:[.,]0+)?\b(?![,.]\d), see this regex demo.
This regex works. Could you also extend it to include the logic for thousand separators?
It seems to have a problem: It detects patterns like 2,102.120 i.e. a decimal with one zero at the end.
@CodePope I fixed that.
0

Without the need for regex, you can also consider using is_integer() after trying to conver the values into their respective numeric formats. While a little bit harder to read, it removes the need for regex and should be robust for further use cases given the string structure you provide:

[x for x in string.split() if float((pd.to_numeric(x.replace(r'(','').replace(r')','').replace(r',','.'),errors='coerce'))).is_integer()]

Returning the former values in the list:

['(2.000)', '2,000', '2']

Or if you'd like them cleaned:

[x for x in string.replace(r'(','').replace(r')','').replace(r',','.').split() if float((pd.to_numeric(x,errors='coerce'))).is_integer()]

Returning:

['2.000', '2.000', '2']

Comments

0

This should be easy - just get a number and check "is this an int value?". Meaby something like this...

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2'
out_ints = []
for x in  re.findall(r'([0-9.,]+)', text):
    possible_int = x.replace(',', '.')
    is_int = int(float(possible_int)) == float(possible_int)
    if is_int:
        out_ints.append(int(float(possible_int)))

print(out_ints)

Output:

[2, 2, 2]

Or am i missing something?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.