I have a method to extract number from a string using regex, like that:
def format
str = "R$ 10.000,00 + Benefits"
str.split(/[^\d]/).join
end
Its returns --> 1000000. I need to modfy regex to return 10000, removing zeros after comma.
You can use
str.gsub(/(?<=\d),\d+|\D/, '')
See the regex demo.
Regex details
(?<=\d),\d+ - a comma that is immediately preceded with a digit ((?<=\d) is a positive lookbehind) and then one or more digits| - or\D - any non-digit symbolOne important aspect is that you should order these alternatives like this, \D must be used as the last alternative. Else, \D can match a , and the solution won't work.
"On October 7 Joe paid R$ 10.000,00.".gsub(/(?<=\d),\d+|\D/, '') #=> "710000".str = "R$ 10.000,00 R$1.200.000,03 R$ 0,09 R$ 4.00,10 R$ 3.30005,00 R$ 6.700 R$ 6, R$ 6,0 R$ 00,20 R$6,001 US$ 5.122,00 Benefits"
R = /(?:(?<=\bR\$)|(?<=\bR\$ ))(?:0|[1-9]\d{0,2}(?:\.\d{3})*),\d{2}(?!\d)/
str.scan(R).map { |s| s.delete('.') }
#=> ["10000,00", "1200000,03", "0,09"]
None of the following substrings match because they have invalid formats: "4.00,10", " 3.30005,00", "6.700", "6,", "6,0", "00,20", "6,001" and "5.122,00" (the last because it is not preceded by "$R" or "$R ".
The regular expression can be written in free-spacing mode (/x) to make it self-documenting.
R = /
(?: # begin non-capture group
(?<=\bR\$) # positive lookbehind asserts match is preceded by 'R$'
# that is preceded by a word break
| # or
(?<=\bR\$\ ) # positive lookbehind asserts match is preceded by 'R$ '
# that is preceded by a word break
) # end non-capture group
(?<= # begin negative lookbehind
$R[ ]) # asserts that match is preceded by a space
(?: # begin non-capture group
0 # match zero
| # or
[1-9] # match a digit other than zero
\d{0,2} # match 0-2 digits
(?:\.\d{3}) # match '.' followed by three digits in a non-capture group
* # execute preceding non-capture group 0+ times
) # end non-capture group
,\d{2} # match ',' followed by two digits
(?!\d) # negative lookahead asserts match is not followed by a digit
/x
Here is a slightly longer, but perhaps simpler and easier to understand solution. You can use it as an alternative to the excellent and concise answer by Wiktor Stribiżew, and the very thorough and complete answer by Cary Swoveland. Note that my answer may not work for some (more complex) strings, as mentioned in the comment by Cary below.
str = "R$ 10.000,00 + Benefits"
puts str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')
# => 10000
Here gsub is applied to the input string twice:
gsub(/^.*?(\d+[\d.]*).*$/, '\1') : grab 10.000 part.^ is the beginning of the string..*? is any character repeated 0 or more times, non-greedy (that is, minimum number of times).(\d[\d.]*) is any digit followed by digits or literal dots (.). The parenthesis capture this and put into the first capture group (to be used later as '\1' as the replacement string)..* is any character repeated 0 or more times, greedy (that is, as many as possible).$ is the end of the string.'\1', which is 10.000 here. Remember to use single quotes around \1, otherwise escape it twice like so: "\\1".gsub(/[.]/, '') : remove all literal dots (.) in the string.Note that this code does the expected replacements for a number of similar strings (but nothing fancier, such as leaves 001 as is):
['R$ 10.000,00 + Benefits',
'R$ 0,00 + Benefits',
'R$ .001,00 + Benefits',
'. 10.000,00 + Benefits',].each do |str|
puts [str, str.gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '')].join(" => ")
end
Output:
R$ 10.000,00 + Benefits => 10000
R$ 0,00 + Benefits => 0
R$ .001,00 + Benefits => 001
. 10.000,00 + Benefits => 10000
"On October 7 Joe paid R$ 10.000,00.".gsub(/^.*?(\d+[\d.]*).*$/, '\1').gsub(/[.]/, '') #=> "7"
str.gsub(/(?<=\d),\d+|\D/, '')? Or really only zeros? Thenstr.gsub(/(?<=\d),0++(?!\d)|\D/, '')?<=? Thanks a lot!str.gsub(/(?<=\d),\d+|\D/, '')Works like a charm.str = "Joe paid $ 10.000,00, but Jane got the better deal for $ 7.900,00"Removing all decimal digits and non-digits will leave you with100007900. Or does that scenario never happen?"On October 7 Joe paid $ 10.000,00".split(/[^\d]/).join #=> "71000000.