2

I have the following strings:

1: "AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA"

2: "AMETHYST 9x10 OVAL CHECKERBOARD AAA"

3: "AMETHYST 9-10 OVAL CHECKERBOARD AAA"

4: "AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA"

5: "AMETHYST 9.5 OVAL CHECKERBOARD AAA"

6: "AMETHYST 9 OVAL CHECKERBOARD AAA"

Per case I would like my regex to return an array of the integers or floats for example taking the first case:

[
  [0] "9.5"
  [1] "10.5"
]

After much trying on Rubular I came up with:

/\d+[.]\d+?/

This gives me most of the match results I need when checking on Rubular.com. However in the cases 2, 3, 6 it will not pickup on the integer in front of the - or x character, or when the int is alone like case 6.

What am I missing?

THANKS!

4
  • 1
    Are the prefixes "1:, 2:, 3:" in the data? Or just for labeling the lines? Commented Jul 17, 2014 at 16:23
  • Note to everyone. The question is: "What am I missing?" So far, only Nishu's and my answers answer this question. Commented Jul 17, 2014 at 16:28
  • 2
    This isn't Jeopardy, you can phrase the answer any way you like. Working examples that solve his problem with regex or other alternatives are still useful answers. Besides "What am I missing?" is more of an expression, not a specific request. Commented Jul 17, 2014 at 16:46
  • 1
    Only the OP can determine what answers the question. Commented Jul 17, 2014 at 16:50

5 Answers 5

5

This should do it:

def doit(str)
  str.scan(/\d+\.?\d*/)
end

doit "AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA" #=> ["9.5", "10.5"]
doit "AMETHYST 9x10 OVAL CHECKERBOARD AAA"     #=> ["9", "10"]
doit "AMETHYST 9-10 OVAL CHECKERBOARD AAA"     #=> ["9", "10"]
doit "AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA" #=> ["9.5", "10.5"]
doit "AMETHYST 9.5 OVAL CHECKERBOARD AAA"      #=> ["9.5"]
doit "AMETHYST 9 OVAL CHECKERBOARD AAA"        #=> ["9"]
Sign up to request clarification or add additional context in comments.

2 Comments

#scan -- That's it. No argument.
@Arup, actually, it does take an argument. :-)
4

You are missing to make the period character optional. That can be done by using the quantifier ?.

By the way, it is not clear why you need to make the fractional digit non-greedy. You did not explain that you only want a single digit below the decimal point. Furthermore, it does not make sense to use it together with the quantifier + as in \d+? in this case; it would have the same effect as \d.

To make it work, you can have a regex like this:

/\d+\.?\d*/

or

/\d+(?:\.\d+)?/

4 Comments

Suppose the OP wanted to extract all strings that begin with one or more digits, possibly followed by at most one period, and if a period is present, it must be followed by one or more additional digits. What regex would you use for that?
@CarySwoveland My second regex will do that.
Your second regex returns ["9.5", "10"] for "AMETHYST 9.5x10. OVAL". What I meant is what regex would return ["9.5"] for that string; i.e., disregard 10. because no digits follow 10.. Just curious.
@CarySwoveland I see. What about /\d+\.\d+|(?<!\.)\d+(?!\.)/?
2

There are 2 things missing in the regex.

First: Make the dot character as optional by using a ? following it. Second: Make value following dot as optional and dynamic length by adding *

\d+[.]?\d*

2 Comments

Using [.] is the long way around to escape .. Instead, use \..
Agreed. Continued the convention in the question itself.
2

Assuming your example input is accurate, I'd use scan, since that's what it's made for, and massage the results a tiny bit to only return the values you want:

strings = [
  '1: "AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA"',
  '2: "AMETHYST 9x10 OVAL CHECKERBOARD AAA"',
  '3: "AMETHYST 9-10 OVAL CHECKERBOARD AAA"',
  '4: "AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA"',
  '5: "AMETHYST 9.5 OVAL CHECKERBOARD AAA"',
  '6: "AMETHYST 9 OVAL CHECKERBOARD AAA"',
]

strings.map{ |s| s.scan(/\d+[.\d]*/)[1..-1] }
# => [["9.5", "10.5"],
#     ["9", "10"],
#     ["9", "10"],
#     ["9.5", "10.5"],
#     ["9.5"],
#     ["9"]]

/\d+[.\d]*/ means "Find one or more digits, optionally followed by any number of '.' and digits. That'll match the leading 1:, but slicing the array strips those. If numbers existed like 1.0.0.0 the pattern would return 1.0.0.0 but that's a pretty nonsensical value for this sort of output so I think the pattern is reasonably safe.

If the example input isn't accurate, and the line numbers didn't really exist then it becomes more simple:

strings = [
  '"AMETHYST 9.5x10.5 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9x10 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9-10 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9.5-10.5 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9.5 OVAL CHECKERBOARD AAA"',
  '"AMETHYST 9 OVAL CHECKERBOARD AAA"',
]

strings.map{ |s| s.scan(/\d+[.\d]*/) }
# => [["9.5", "10.5"],
#     ["9", "10"],
#     ["9", "10"],
#     ["9.5", "10.5"],
#     ["9.5"],
#     ["9"]]

Comments

0

This works on Rubular for the examles you provided:

\d+(?:[.]\d+)?

Simply put a non capturing group around the last part and placed your last ? on the outside making it a 0 or 1 instead of a lazy quantifier.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.