1

I need to fetch comma separated integers from a string of specific format using Ruby String#match method:

'text PaymentID: 12345'.match(PATTERN)[1..-1]          # expected result: ['12345']
'text Payment ID: 12345'.match(PATTERN)[1..-1]         # expected result: ['12345']
'text Payment id 12345'.match(PATTERN)[1..-1]          # expected result: ['12345']
'text paymentid:12345'.match(PATTERN)[1..-1]           # expected result: ['12345']
'text payment id: 12345'.match(PATTERN)[1..-1]         # expected result: ['12345']
'text payment ID: 111,999'.match(PATTERN)[1..-1]       # expected result: ['111', '999']
'text payment ID: 111, 222, 333'.match(PATTERN)[1..-1] # expected result: ['111', '222', '333']

So all spaces and ':' symbol are optional, the pattern should be case insensitive, text before payment can contain any characters. My last variant was not good enough:

PATTERN = /payment[\s]?id[:]?[\s]?(\d+)(?:[,]?[\s]?(\d+))+/i

> 'text Payment id: 12345'.match(PATTERN)[1..-1]
=> ["1234", "5"]
> 'text Payment id: 12345, 333, 91872389'.match(PATTERN)[1..-1]
=> ["12345", "91872389"]

Any ideas on how to achieve this? Thanks in advance.

2
  • 1
    Why not text.scan(/\d+/)? Or maybe text.scan(/(?:\G(?!\A)\s*,|payment\s?id:?)\s*\K\d+/i)? Commented Dec 1, 2021 at 15:59
  • @WiktorStribiżew text before the payment word can contain any characters, including digits. Question updated, sorry. I'll test the second regex, it looks suitable for my needs. Commented Dec 1, 2021 at 16:07

2 Answers 2

2

You can use

text.scan(/(?:\G(?!\A)\s*,|payment\s?id:?)\s*\K\d+/i)

The regex matches

  • (?:\G(?!\A)\s*,|payment\s?id:?) - the end of the previous successful match and then zero or more whitespaces and a comma or payment, an optional whitespace, id and an optional colon
  • \s* - zero or more whitespaces
  • \K removes what has just been consumed from the match
  • \d+ - one or more digits.
Sign up to request clarification or add additional context in comments.

Comments

0

You can't repeat a capture group since the last occurrence will overwrite the previous. What you can do is to use a \G based pattern that ensures the contiguity between successive matches with the scan method:

PATTERN = /(?:(?!\A)\G\s*,|payment\s*id\s*:?)\s*(\d+)/i

'text Payment id: 12345, 333, 91872389'.scan(PATTERN).flatten

In short the second branch payment\s*id\s*:? have to succeed first, to allow the first branch (?!\A)\G\s* to succeed for the next matches.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.