2

I have a method to extract number from a string using regex, like that:

def format 
  str = "R$ 10.000,00 + Benefits"
  str.split(/[^\d]/).join
end

Its returns --> 1000000. I need to modfy regex to return 10000, removing zeros after comma.

5
  • 2
    str.gsub(/(?<=\d),\d+|\D/, '')? Or really only zeros? Then str.gsub(/(?<=\d),0++(?!\d)|\D/, '') Commented Oct 5, 2020 at 16:45
  • @WiktorStribiżew you're a regex wizard! Could you please explain the ?<=? Thanks a lot! Commented Oct 5, 2020 at 16:53
  • 1
    str.gsub(/(?<=\d),\d+|\D/, '') Works like a charm. Commented Oct 5, 2020 at 16:56
  • 1
    What if there are multiple numbers in the string? str = "Joe paid $ 10.000,00, but Jane got the better deal for $ 7.900,00" Removing all decimal digits and non-digits will leave you with 100007900. Or does that scenario never happen? Commented Oct 5, 2020 at 17:34
  • If not evident from @3limin4t0r's comment, the problem occurs if there are any other digits in the string, not just representations of dollar amounts: "On October 7 Joe paid $ 10.000,00".split(/[^\d]/).join #=> "71000000. Commented Oct 5, 2020 at 19:32

3 Answers 3

2

You can use

str.gsub(/(?<=\d),\d+|\D/, '')

See the regex demo.

Regex details

  • (?<=\d),\d+ - a comma that is immediately preceded with a digit ((?<=\d) is a positive lookbehind) and then one or more digits
  • | - or
  • \D - any non-digit symbol

One important aspect is that you should order these alternatives like this, \D must be used as the last alternative. Else, \D can match a , and the solution won't work.

Sign up to request clarification or add additional context in comments.

2 Comments

"On October 7 Joe paid R$ 10.000,00.".gsub(/(?<=\d),\d+|\D/, '') #=> "710000".
@CarySwoveland The strings OP has only contain one numeric value to get.
2
str = "R$ 10.000,00 R$1.200.000,03 R$ 0,09 R$ 4.00,10 R$ 3.30005,00 R$ 6.700 R$ 6, R$ 6,0 R$ 00,20 R$6,001 US$ 5.122,00 Benefits"
R = /(?:(?<=\bR\$)|(?<=\bR\$ ))(?:0|[1-9]\d{0,2}(?:\.\d{3})*),\d{2}(?!\d)/
str.scan(R).map { |s| s.delete('.') }
  #=> ["10000,00", "1200000,03", "0,09"]

None of the following substrings match because they have invalid formats: "4.00,10", " 3.30005,00", "6.700", "6,", "6,0", "00,20", "6,001" and "5.122,00" (the last because it is not preceded by "$R" or "$R ".

The regular expression can be written in free-spacing mode (/x) to make it self-documenting.

R = /
    (?:            # begin non-capture group
      (?<=\bR\$)   # positive lookbehind asserts match is preceded by 'R$'
                   #   that is preceded by a word break
      |            # or
      (?<=\bR\$\ ) # positive lookbehind asserts match is preceded by 'R$ '
                   #   that is preceded by a word break
    )              # end non-capture group
    (?<=           # begin negative lookbehind 
      $R[ ])       #  asserts that match is preceded by a space
    (?:            # begin non-capture group
      0            # match zero
      |            # or
      [1-9]        # match a digit other than zero
      \d{0,2}      # match 0-2 digits
      (?:\.\d{3})  # match '.' followed by three digits in a non-capture group 
      *            # execute preceding non-capture group 0+ times
    )              # end non-capture group
    ,\d{2}         # match ',' followed by two digits
    (?!\d)         # negative lookahead asserts match is not followed by a digit
    /x

Comments

1

Here is a slightly longer, but perhaps simpler and easier to understand solution. You can use it as an alternative to the excellent and concise answer by Wiktor Stribiżew, and the very thorough and complete answer by Cary Swoveland. Note that my answer may not work for some (more complex) strings, as mentioned in the comment by Cary below.

str = "R$ 10.000,00 + Benefits"
puts str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')
# => 10000

Here gsub is applied to the input string twice:

  1. gsub(/^.*?(\d+[\d.]*).*$/, '\1') : grab 10.000 part.
    ^ is the beginning of the string.
    .*? is any character repeated 0 or more times, non-greedy (that is, minimum number of times).
    (\d[\d.]*) is any digit followed by digits or literal dots (.). The parenthesis capture this and put into the first capture group (to be used later as '\1' as the replacement string).
    .* is any character repeated 0 or more times, greedy (that is, as many as possible).
    $ is the end of the string.
    Thus, we replace the entire string with the first captured group: '\1', which is 10.000 here. Remember to use single quotes around \1, otherwise escape it twice like so: "\\1".
  2. gsub(/[.]/, '') : remove all literal dots (.) in the string.

Note that this code does the expected replacements for a number of similar strings (but nothing fancier, such as leaves 001 as is):

['R$ 10.000,00 + Benefits',
 'R$      0,00 + Benefits',
 'R$   .001,00 + Benefits',
 '.  10.000,00 + Benefits',].each do |str|
  puts [str, str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')].join(" => ")
end

Output:

R$ 10.000,00 + Benefits => 10000
R$      0,00 + Benefits => 0
R$   .001,00 + Benefits => 001
.  10.000,00 + Benefits => 10000

2 Comments

"On October 7 Joe paid R$ 10.000,00.".gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '') #=> "7"
@CarySwoveland Thank you for pointing out the incorrect parsing of some strings in my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.