Fetch comma separated numbers by regex

Question

I need to fetch comma separated integers from a string of specific format using Ruby String#match method:

'text PaymentID: 12345'.match(PATTERN)[1..-1]          # expected result: ['12345']
'text Payment ID: 12345'.match(PATTERN)[1..-1]         # expected result: ['12345']
'text Payment id 12345'.match(PATTERN)[1..-1]          # expected result: ['12345']
'text paymentid:12345'.match(PATTERN)[1..-1]           # expected result: ['12345']
'text payment id: 12345'.match(PATTERN)[1..-1]         # expected result: ['12345']
'text payment ID: 111,999'.match(PATTERN)[1..-1]       # expected result: ['111', '999']
'text payment ID: 111, 222, 333'.match(PATTERN)[1..-1] # expected result: ['111', '222', '333']

So all spaces and ':' symbol are optional, the pattern should be case insensitive, text before payment can contain any characters. My last variant was not good enough:

PATTERN = /payment[\s]?id[:]?[\s]?(\d+)(?:[,]?[\s]?(\d+))+/i

> 'text Payment id: 12345'.match(PATTERN)[1..-1]
=> ["1234", "5"]
> 'text Payment id: 12345, 333, 91872389'.match(PATTERN)[1..-1]
=> ["12345", "91872389"]

Any ideas on how to achieve this? Thanks in advance.

Why not text.scan(/\d+/)? Or maybe text.scan(/(?:\G(?!\A)\s*,|payment\s?id:?)\s*\K\d+/i)? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Dec 1, 2021 at 15:59
@WiktorStribiżew text before the payment word can contain any characters, including digits. Question updated, sorry. I'll test the second regex, it looks suitable for my needs. — taras
– taras, Commented Dec 1, 2021 at 16:07

Wiktor Stribiżew · Accepted Answer · 2021-12-01 16:17:48Z

2

You can use

text.scan(/(?:\G(?!\A)\s*,|payment\s?id:?)\s*\K\d+/i)

The regex matches

(?:\G(?!\A)\s*,|payment\s?id:?) - the end of the previous successful match and then zero or more whitespaces and a comma or payment, an optional whitespace, id and an optional colon
\s* - zero or more whitespaces
\K removes what has just been consumed from the match
\d+ - one or more digits.

answered Dec 1, 2021 at 16:17

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Casimir et Hippolyte · Accepted Answer · 2021-12-01 16:13:30Z

0

You can't repeat a capture group since the last occurrence will overwrite the previous. What you can do is to use a \G based pattern that ensures the contiguity between successive matches with the scan method:

PATTERN = /(?:(?!\A)\G\s*,|payment\s*id\s*:?)\s*(\d+)/i

'text Payment id: 12345, 333, 91872389'.scan(PATTERN).flatten

In short the second branch payment\s*id\s*:? have to succeed first, to allow the first branch (?!\A)\G\s* to succeed for the next matches.

answered Dec 1, 2021 at 16:13

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Collectives™ on Stack Overflow

Fetch comma separated numbers by regex

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related