Ruby regex return array of numbers only

Question

I have the following strings:

1: "AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA"

2: "AMETHYST 9x10 OVAL CHECKERBOARD AAA"

3: "AMETHYST 9-10 OVAL CHECKERBOARD AAA"

4: "AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA"

5: "AMETHYST 9.5 OVAL CHECKERBOARD AAA"

6: "AMETHYST 9 OVAL CHECKERBOARD AAA"

Per case I would like my regex to return an array of the integers or floats for example taking the first case:

[
  [0] "9.5"
  [1] "10.5"
]

After much trying on Rubular I came up with:

/\d+[.]\d+?/

This gives me most of the match results I need when checking on Rubular.com. However in the cases 2, 3, 6 it will not pickup on the integer in front of the - or x character, or when the int is alone like case 6.

What am I missing?

THANKS!

Are the prefixes "1:, 2:, 3:" in the data? Or just for labeling the lines? — Nick Veys
– Nick Veys, Commented Jul 17, 2014 at 16:23
Note to everyone. The question is: "What am I missing?" So far, only Nishu's and my answers answer this question. — sawa
– sawa, Commented Jul 17, 2014 at 16:28
This isn't Jeopardy, you can phrase the answer any way you like. Working examples that solve his problem with regex or other alternatives are still useful answers. Besides "What am I missing?" is more of an expression, not a specific request. — Søren Ullidtz
– Søren Ullidtz, Commented Jul 17, 2014 at 16:46

Cary Swoveland · Accepted Answer · 2014-07-17 16:25:17Z

5

This should do it:

def doit(str)
  str.scan(/\d+\.?\d*/)
end

doit "AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA" #=> ["9.5", "10.5"]
doit "AMETHYST 9x10 OVAL CHECKERBOARD AAA"     #=> ["9", "10"]
doit "AMETHYST 9-10 OVAL CHECKERBOARD AAA"     #=> ["9", "10"]
doit "AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA" #=> ["9.5", "10.5"]
doit "AMETHYST 9.5 OVAL CHECKERBOARD AAA"      #=> ["9.5"]
doit "AMETHYST 9 OVAL CHECKERBOARD AAA"        #=> ["9"]

answered Jul 17, 2014 at 16:25

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Arup Rakshit Over a year ago

#scan -- That's it. No argument.

Cary Swoveland Over a year ago

@Arup, actually, it does take an argument. :-)

sawa · Accepted Answer · 2014-07-17 16:26:09Z

4

You are missing to make the period character optional. That can be done by using the quantifier ?.

By the way, it is not clear why you need to make the fractional digit non-greedy. You did not explain that you only want a single digit below the decimal point. Furthermore, it does not make sense to use it together with the quantifier + as in \d+? in this case; it would have the same effect as \d.

To make it work, you can have a regex like this:

/\d+\.?\d*/

or

/\d+(?:\.\d+)?/

edited Jul 17, 2014 at 16:26

answered Jul 17, 2014 at 16:18

sawa

169k51 gold badges287 silver badges398 bronze badges

4 Comments

Cary Swoveland Over a year ago

Suppose the OP wanted to extract all strings that begin with one or more digits, possibly followed by at most one period, and if a period is present, it must be followed by one or more additional digits. What regex would you use for that?

sawa Over a year ago

@CarySwoveland My second regex will do that.

Cary Swoveland Over a year ago

Your second regex returns ["9.5", "10"] for "AMETHYST 9.5x10. OVAL". What I meant is what regex would return ["9.5"] for that string; i.e., disregard 10. because no digits follow 10.. Just curious.

sawa Over a year ago

@CarySwoveland I see. What about /\d+\.\d+|(?<!\.)\d+(?!\.)/?

nishu · Accepted Answer · 2014-07-17 16:25:30Z

2

There are 2 things missing in the regex.

First: Make the dot character as optional by using a ? following it. Second: Make value following dot as optional and dynamic length by adding *

\d+[.]?\d*

answered Jul 17, 2014 at 16:25

nishu

1,49311 silver badges26 bronze badges

2 Comments

the Tin Man Over a year ago

Using [.] is the long way around to escape .. Instead, use \..

nishu Over a year ago

Agreed. Continued the convention in the question itself.

the Tin Man · Accepted Answer · 2014-07-17 17:39:16Z

Assuming your example input is accurate, I'd use scan, since that's what it's made for, and massage the results a tiny bit to only return the values you want:

strings = [
  '1: "AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA"',
  '2: "AMETHYST 9x10 OVAL CHECKERBOARD AAA"',
  '3: "AMETHYST 9-10 OVAL CHECKERBOARD AAA"',
  '4: "AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA"',
  '5: "AMETHYST 9.5 OVAL CHECKERBOARD AAA"',
  '6: "AMETHYST 9 OVAL CHECKERBOARD AAA"',
]

strings.map{ |s| s.scan(/\d+[.\d]*/)[1..-1] }
# => [["9.5", "10.5"],
#     ["9", "10"],
#     ["9", "10"],
#     ["9.5", "10.5"],
#     ["9.5"],
#     ["9"]]

/\d+[.\d]*/ means "Find one or more digits, optionally followed by any number of '.' and digits. That'll match the leading 1:, but slicing the array strips those. If numbers existed like 1.0.0.0 the pattern would return 1.0.0.0 but that's a pretty nonsensical value for this sort of output so I think the pattern is reasonably safe.

If the example input isn't accurate, and the line numbers didn't really exist then it becomes more simple:

strings = [
  '"AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9x10 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9-10 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9.5 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9 OVAL CHECKERBOARD AAA"',
]

strings.map{ |s| s.scan(/\d+[.\d]*/) }
# => [["9.5", "10.5"],
#     ["9", "10"],
#     ["9", "10"],
#     ["9.5", "10.5"],
#     ["9.5"],
#     ["9"]]

Søren Ullidtz · Accepted Answer · 2014-07-17 16:32:55Z

0

This works on Rubular for the examles you provided:

\d+(?:[.]\d+)?

Simply put a non capturing group around the last part and placed your last ? on the outside making it a 0 or 1 instead of a lazy quantifier.

edited Jul 17, 2014 at 16:32

answered Jul 17, 2014 at 16:20

Søren Ullidtz

1,5241 gold badge16 silver badges27 bronze badges

Collectives™ on Stack Overflow

Ruby regex return array of numbers only

5 Answers 5

2 Comments

4 Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

4 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related