Python Regex: Find integer with possible zeros after comma

Question

I have the following case:

Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2

I try to use regex to find only the integers:

2.000
2,000
2

but not the other float numbers.
I tried different things:

re.search('(?<![0-9.])2(?![.,]?[1-9])(?=[.,]*[0]*)(?![1-9]),...)

but this returns true for:

2.00001
2.000
2,000
2,0001
2

What have I to do?

UPDATE
I have updated the question and it should also find an integer without any comma and point, too (2).

Try (?<!\d)(?<!\d[.,])\d{1,3}(?:[.,]\d{3})*(?![,.]?\d), see demo. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 10, 2022 at 14:25
@WiktorStribiżew it does not match all integers in test 2 2.00 — Code Pope
– Code Pope, Commented Nov 10, 2022 at 14:28
If you do NOT want to match 2.00001, why do you want to match 2.00? How can you formulate the pattern requirements regarding differentiation between valid and non-valid floats? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 10, 2022 at 14:30
What about (?<!\d)(?<!\d[.,])(?:\d{1,3}(?:([.,])\d{3})*|\d{4,})(?:(?!\1)[.,]0+)?(?![,.]?\d)? See regex101.com/r/qrG8hg/2 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 10, 2022 at 14:40
If you do not need to support thousand separators: (?<!\d)(?<!\d[.,])\d+(?:[.,]0+)?(?![,.]?\d) - see this demo. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 10, 2022 at 14:45

mozway · Accepted Answer · 2022-11-10 14:42:27Z

1

I would use:

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000'

re.findall(r'(\d+[.,]0+)(?!\d)', text)

Output:

['2.000', '2,000']

Regex:

(        # start capturing
\d+      # match digit(s)
[.,]     # match . or ,
0+       # match one or more zeros
)        # stop capturing
(?!\d)   # ensure the last zero is not followed by a digit

regex demo

If you also want to match "intergers" alone, surrounded by spaces or parentheses/brackets:

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 2'

re.findall(r'(?:^|[(\s[])(\d+(?:[.,]0+(?!\d))?)(?=[]\s)]|$)', text)

Regex:

(?:^|[(\s[])      # match the start of string or [ or ( or space
(                 # start capturing
\d+               # match digit(s)
(?:[.,]0+(?!\d))? # optionally match . or , with only zeros
)                 # stop capturing
(?=[]\s)]|$)      # match the end of string or ] or ) or space

regex demo

edited Nov 10, 2022 at 14:42

answered Nov 10, 2022 at 13:13

mozway

267k13 gold badges55 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Code Pope Over a year ago

Ok, I have to update my question. It should of course also match 2 alone without , and ..

mozway Over a year ago

@CodePope OK, but how do you define a number? can you have cases like abc123 or 127.0.0.0?

Code Pope Over a year ago

No, such cases are not possible. Numbers appear either in brackets or have at least whitespace prior to them.

Code Pope Over a year ago

But it does not match all integers when the string is 2 2.00 .

mozway Over a year ago

You said there is "at least whitespace prior to them", I didn't include a start of string boundary, let me update ;)

|

Wiktor Stribiżew · Accepted Answer · 2022-11-11 12:30:25Z

1

You can use

re.findall(r'\b(?<!\d[.,])\d+(?:[.,]0+)?\b(?![,.]\d)', text)

See the regex demo. Details:

\b - a word boundary
(?<!\d[.,]) - no digit followed with . or , immediately on the left
\d+ - one or more digits
(?:[.,]0+)? - an optional sequence of . or , and then one or more zeros
\b - a word boundary
(?![,.]\d) - no , or . and a digit allowed immediately to the right.

If you need to support thousand separators:

pattern = r'\b(?<!\d[.,])(?:\d{1,3}(?:(?=([.,]))(?:\1\d{3})+)?|\d{4,})(?:(?!\1)[.,]0+)?\b(?![,.]\d)'
matches = [x.group() for x in re.finditer(pattern, text)]

See this regex demo.

edited Nov 11, 2022 at 12:30

answered Nov 10, 2022 at 14:59

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

5 Comments

Code Pope Over a year ago

There is one problem with that and I know I haven't given this sample, but if a have a string like GB2L 114, it will also return the 2 inside the string.

Wiktor Stribiżew Over a year ago

@CodePope Then you need a different type of boundaries: (?<!\d[.,])\b\d+(?:[.,]0+)?\b(?![,.]\d), see this regex demo.

Code Pope Over a year ago

This regex works. Could you also extend it to include the logic for thousand separators?

Code Pope Over a year ago

It seems to have a problem: It detects patterns like 2,102.120 i.e. a decimal with one zero at the end.

Wiktor Stribiżew Over a year ago

@CodePope I fixed that.

Celius Stingher · Accepted Answer · 2022-11-10 13:25:48Z

0

Without the need for regex, you can also consider using is_integer() after trying to conver the values into their respective numeric formats. While a little bit harder to read, it removes the need for regex and should be robust for further use cases given the string structure you provide:

[x for x in string.split() if float((pd.to_numeric(x.replace(r'(','').replace(r')','').replace(r',','.'),errors='coerce'))).is_integer()]

Returning the former values in the list:

['(2.000)', '2,000', '2']

Or if you'd like them cleaned:

[x for x in string.replace(r'(','').replace(r')','').replace(r',','.').split() if float((pd.to_numeric(x,errors='coerce'))).is_integer()]

Returning:

['2.000', '2.000', '2']

answered Nov 10, 2022 at 13:25

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

Comments

RobertG · Accepted Answer · 2022-11-10 14:04:08Z

0

This should be easy - just get a number and check "is this an int value?". Meaby something like this...

import re

text = 'Test (2.00001) Test (2.000) Test 2.1 Test (2,0001) Test 2,000 Test 2,1000 test 2'
out_ints = []
for x in  re.findall(r'([0-9.,]+)', text):
    possible_int = x.replace(',', '.')
    is_int = int(float(possible_int)) == float(possible_int)
    if is_int:
        out_ints.append(int(float(possible_int)))

print(out_ints)

Output:

[2, 2, 2]

Or am i missing something?

answered Nov 10, 2022 at 14:04

RobertG

4163 silver badges8 bronze badges

Collectives™ on Stack Overflow

Python Regex: Find integer with possible zeros after comma

4 Answers 4

11 Comments

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

11 Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related