1

I would like to extract numbers from a string such as

There are 1,000 people in those 3 towns.

and get an array like ["1,000", "3"].

I got the following number matching Regex from Justin in this question

^[+-]?(\d*|\d{1,3}(,\d{3})*)(\.\d+)?\b$

This works great for checking if it is a number but to make it work on a sentence you need to remove the "^" and "$".

regex101 with start/end defined regex101 without start/end defined

Without the start and end defined you get a bunch of 0 length matches these can easily be discarded but it also now splits any numbers with a comma in them.

How do I make that regex (or a new regex) work on sentences and still find numbers with commas in them.

A bonus would be not having all the 0 length matches as well.

3 Answers 3

4

The expression /-?\d(?:[,\d]*\.\d+|[,\d]*)/g should do it, if you're okay with allowing different groups such as 1,00,000 (which isn't unknown in some locales). I feel like I should be able to simplify that further, but when I try the example "333.33" gets broken up into "333" and "33" as separate numbers. With the above it's kept together.

Live Example:

const str = "There are 10,000 people in those 3 towns. That's 3,333.33 people per town, roughly. Which is about -67.33 from last year.";
const rex = /-?\d(?:[,\d]*\.\d+|[,\d]*)/g;
let match;
while ((match = rex.exec(str)) !== null) {
    console.log(match[0]);
}

Breaking /\d(?:[,\d]*\.\d+|[,\d]*)/g down:

  • -? - an optional minus sign (thank you to x15 for flagging that up in his/her answer!)
  • \d - a digit
  • (?:...|...) - a non-capturing group containing an alternation between
    • [,\d]*\.\d+ - zero or more commas and digits followed by a . and one or more digits, e.g. 3,333.33; or
    • [,\d]* - zero or more commas and digits

The first alternative will match greedily, falling back to the second alternative if there's no decimal point.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks very much! I added (\.\d+)? to the end of it to allow for decimals.
@SamDean - Ah, good point! But I don't think that's going to be sufficient, one sec.
@SamDean - I've updated it to handle fractional numbers, sorry I missed that. It's more complicated than just adding (\.\d+)? after it. :-)
@SamDean - Um...we didn't handle minus signs. :-) So probably want a -? on the front of that. (A [+-]? if you want to allow unary + as well.)
1

One alternate approach is to split with space and see if the value can be parsed to a number,

let numberExtractor = str => str.split(/\s+/)
                                .filter(v => v && parseFloat(v.replace(/[.,]/g, '')))


console.log(numberExtractor('There are 1,000 people in those 3 towns. some more numbers -23.012 1,00,000,00'))

Comments

0

To match integer and decimal numbers where the whole part can have optional
comma's that are between numbers but not in the decimal part is done like this:

/[+-]?(?:(?:\d(?:,(?=\d))?)+(?:\.\d*)?|\.\d+)/

https://regex101.com/r/yOuBPx/1

The input sample does not reflect all the boundary conditions this regex handles.
Best to experiment to see it's full effect.

3 Comments

When posting an answer well after an accepted one, it's useful to point out what it handles that the previous one doesn't, so people can see that more easily and recognize the benefit of the additional answer. (If there isn't anything or it's not significant, then not posting at all is probably best, small things can be comments on answers.)
@TJ.Crowder - Just stated what it does and that the regex should be explored for all the boundary effects. Significantly different than the accepted answer.
@Thefourthbird - It could be the trailing dot would distinguish it as a float. But, given the comma's can be in non-thousands places, it's a jumbled spec.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.