Ruby regex to extract a number from string containing only one number and trim the part after comma

Question

I have a method to extract number from a string using regex, like that:

def format 
  str = "R$ 10.000,00 + Benefits"
  str.split(/[^\d]/).join
end

Its returns --> 1000000. I need to modfy regex to return 10000, removing zeros after comma.

str.gsub(/(?<=\d),\d+|\D/, '')? Or really only zeros? Then str.gsub(/(?<=\d),0++(?!\d)|\D/, '') — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 5, 2020 at 16:45
@WiktorStribiżew you're a regex wizard! Could you please explain the ?<=? Thanks a lot! — Christian Baumann
– Christian Baumann, Commented Oct 5, 2020 at 16:53
What if there are multiple numbers in the string? str = "Joe paid $ 10.000,00, but Jane got the better deal for $ 7.900,00" Removing all decimal digits and non-digits will leave you with 100007900. Or does that scenario never happen? — 3limin4t0r
– 3limin4t0r, Commented Oct 5, 2020 at 17:34
If not evident from @3limin4t0r's comment, the problem occurs if there are any other digits in the string, not just representations of dollar amounts: "On October 7 Joe paid $ 10.000,00".split(/[^\d]/).join #=> "71000000. — Cary Swoveland
– Cary Swoveland, Commented Oct 5, 2020 at 19:32

Wiktor Stribiżew · Accepted Answer · 2020-10-05 16:58:44Z

2

You can use

str.gsub(/(?<=\d),\d+|\D/, '')

See the regex demo.

Regex details

(?<=\d),\d+ - a comma that is immediately preceded with a digit ((?<=\d) is a positive lookbehind) and then one or more digits
| - or
\D - any non-digit symbol

One important aspect is that you should order these alternatives like this, \D must be used as the last alternative. Else, \D can match a , and the solution won't work.

answered Oct 5, 2020 at 16:58

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Cary Swoveland Over a year ago

"On October 7 Joe paid R$ 10.000,00.".gsub(/(?<=\d),\d+|\D/, '') #=> "710000".

Wiktor Stribiżew Over a year ago

@CarySwoveland The strings OP has only contain one numeric value to get.

Cary Swoveland · Accepted Answer · 2020-10-05 20:07:13Z

str = "R$ 10.000,00 R$1.200.000,03 R$ 0,09 R$ 4.00,10 R$ 3.30005,00 R$ 6.700 R$ 6, R$ 6,0 R$ 00,20 R$6,001 US$ 5.122,00 Benefits"

R = /(?:(?<=\bR\$)|(?<=\bR\$ ))(?:0|[1-9]\d{0,2}(?:\.\d{3})*),\d{2}(?!\d)/

str.scan(R).map { |s| s.delete('.') }
  #=> ["10000,00", "1200000,03", "0,09"]

None of the following substrings match because they have invalid formats: "4.00,10", " 3.30005,00", "6.700", "6,", "6,0", "00,20", "6,001" and "5.122,00" (the last because it is not preceded by "$R" or "$R ".

The regular expression can be written in free-spacing mode (/x) to make it self-documenting.

R = /
    (?:            # begin non-capture group
      (?<=\bR\$)   # positive lookbehind asserts match is preceded by 'R$'
                   #   that is preceded by a word break
      |            # or
      (?<=\bR\$\ ) # positive lookbehind asserts match is preceded by 'R$ '
                   #   that is preceded by a word break
    )              # end non-capture group
    (?<=           # begin negative lookbehind 
      $R[ ])       #  asserts that match is preceded by a space
    (?:            # begin non-capture group
      0            # match zero
      |            # or
      [1-9]        # match a digit other than zero
      \d{0,2}      # match 0-2 digits
      (?:\.\d{3})  # match '.' followed by three digits in a non-capture group 
      *            # execute preceding non-capture group 0+ times
    )              # end non-capture group
    ,\d{2}         # match ',' followed by two digits
    (?!\d)         # negative lookahead asserts match is not followed by a digit
    /x

Timur Shtatland · Accepted Answer · 2020-10-05 20:02:55Z

1

Here is a slightly longer, but perhaps simpler and easier to understand solution. You can use it as an alternative to the excellent and concise answer by Wiktor Stribiżew, and the very thorough and complete answer by Cary Swoveland. Note that my answer may not work for some (more complex) strings, as mentioned in the comment by Cary below.

str = "R$ 10.000,00 + Benefits"
puts str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')
# => 10000

Here gsub is applied to the input string twice:

gsub(/^.*?(\d+[\d.]*).*$/, '\1') : grab 10.000 part.
^ is the beginning of the string.
.*? is any character repeated 0 or more times, non-greedy (that is, minimum number of times).
(\d[\d.]*) is any digit followed by digits or literal dots (.). The parenthesis capture this and put into the first capture group (to be used later as '\1' as the replacement string).
.* is any character repeated 0 or more times, greedy (that is, as many as possible).
$ is the end of the string.
Thus, we replace the entire string with the first captured group: '\1', which is 10.000 here. Remember to use single quotes around \1, otherwise escape it twice like so: "\\1".
gsub(/[.]/, '') : remove all literal dots (.) in the string.

Note that this code does the expected replacements for a number of similar strings (but nothing fancier, such as leaves 001 as is):

['R$ 10.000,00 + Benefits',
 'R$      0,00 + Benefits',
 'R$   .001,00 + Benefits',
 '.  10.000,00 + Benefits',].each do |str|
  puts [str, str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')].join(" => ")
end

Output:

R$ 10.000,00 + Benefits => 10000
R$      0,00 + Benefits => 0
R$   .001,00 + Benefits => 001
.  10.000,00 + Benefits => 10000

edited Oct 5, 2020 at 20:02

answered Oct 5, 2020 at 17:41

Timur Shtatland

12.8k3 gold badges41 silver badges68 bronze badges

2 Comments

Cary Swoveland Over a year ago

"On October 7 Joe paid R$ 10.000,00.".gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '') #=> "7"

Timur Shtatland Over a year ago

@CarySwoveland Thank you for pointing out the incorrect parsing of some strings in my answer.

Collectives™ on Stack Overflow

Ruby regex to extract a number from string containing only one number and trim the part after comma

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related