capture valid comma separated number python regex

Question

I'm working with a multi-line string, trying to capture valid comma separated numbers in the string.

For example:

my_string = """42     <---capture 42 in this line
1,234    <---capture 1,234 in this line
3,456,780    <---capture 3,456,780 in this line
34,56,780    <---don't capture anything in this line but 34 and 56,780 captured
1234    <---don't capture anything in this line but 123 and 4 captured
"""

Ideally, I want re.findall to return:

['42', '1,234', '3,456,780']

Here are my code:

a = """
42
1,234
3,456,780
34,56,780
1234
"""
regex = re.compile(r'\d{1,3}(?:,\d{3})*')
print(regex.findall(a))

The result with my code above is:

['42', '1,234', '3,456,780', '34', '56,780', '123', '4']

But my desired output should be:

['42', '1,234', '3,456,780']

Unrelated to the problem: you don't need the capturing group around the whole regexp. — Barmar
– Barmar, Commented Mar 2, 2020 at 5:16
Is the result with your code correct? If so, what is your question? — Cary Swoveland
– Cary Swoveland, Commented Mar 2, 2020 at 5:24
Given your desired result (['42', '1,234', '3,456,780']), what do you mean by, "...but 34 and 56,780 captured" and "...but 123 and 4 captured"? — Cary Swoveland
– Cary Swoveland, Commented Mar 2, 2020 at 6:09
@CarySwoveland, 34,56,780(has only two digit(56) between commas) and 1234(lacks comma) is not a valid comma separated format. So I want invalid comma separated number not to be captured. — Sang-il Ahn
– Sang-il Ahn, Commented Mar 2, 2020 at 7:59

Barmar · Accepted Answer · 2020-03-02 17:02:00Z

3

If you only want to capture whole lines that match the pattern, you need to anchor the regexp with ^ and $, and use the re.MULTILINE flag so that they match line beginnings/endings rather than only string beginning/ending.

regex = re.compile(r'^\d{1,3}(?:,\d{3})*$', re.MULTILINE)

edited Mar 2, 2020 at 17:02

answered Mar 2, 2020 at 5:20

Barmar

789k57 gold badges554 silver badges669 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

aneroid Over a year ago

Either I'm missing something obvious or you've forgotten to put the ^ and $ at the ends of the regex. Without them, the regex has the same result as OP's question.

Sang-il Ahn Over a year ago

@Barmar, Given a multi-line string which only has digits (no white spaces, alphas, etc in each line), your suggestion works. What if each line has alphas and special characters in it as follows? a = """ 42 asdfad <-- 42 should be captured in this line 1,234 as d <-- 1,234 should be captured in this line 3,456,780 <-- 3,456,780 should be captured in this line 34,56,780 <-- nothing should be captured in this line 1234 <--nothing should be captured in this line """

Barmar Over a year ago

You could put \D* after ^ and before $.

Barmar Over a year ago

@aneroid Oops, I had made the change while testing at regex101 but forgot to copy it to the answer.

Toto · Accepted Answer · 2020-03-02 11:42:53Z

1

Use lookarounds to make sure we haven't digit or comma before and after the numbers:

import re

a = """
42
1,234
3,456,780
34,56,780
1234
"""
regex = re.compile(r'(?<![\d,])\d{1,3}(?:,\d{3})*(?![\d,])')
print(regex.findall(a))

Output:

['42', '1,234', '3,456,780']

answered Mar 2, 2020 at 11:42

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Collectives™ on Stack Overflow

capture valid comma separated number python regex

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related