2

I have a string as below

"Temporada 2015"

and also I get string as

"Temporada 8"

I need to match and extract only numbers from the string 2015 and 8. How do i do it using regex. I tried like below

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*(\d+)/)[2]

But it returned only 5 for first one instead of 2015. How do I match both and return only nos.??

5 Answers 5

2

The .* is "greedy". It matches as many characters as it can. So it leaves just one digit for the \d+.

If your strings are known to contain no other numbers, you can just do

.scan(/\d+/).first

otherwise you can just match non-digit

.match(/(Tempo)[^\d]*(\d+)/)[2]
Sign up to request clarification or add additional context in comments.

Comments

1

You should add a ? to make the regex non-greedy:

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2];

Here is a sample program for verification.

4 Comments

Wholly my! 4 answers displayed simultaneously! :)
Yeah, I also had to click that box :)
@shivam, maybe it had something to do with your avatar.
@CarySwoveland ROFL.. I think Im too lean to be mistaken as terminator. :D
1

Because .* is greedy which matches all the characters as much as possible, so that it returns you the last digit where all the previous characters are greedily matched. By turning greedy .* to non-greedy .*?, it will do a shortest possible match which inturn give you the last number.

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2]

Comments

1

You can scan directly for digits:

"Temporada 2015".scan(/\d+/)
# => ["2015"]
"Temporada 8".scan(/\d+/)
# => ["8"]

If you want to include Temp in regex:

"Temporada 2015".scan(/Temp.*?(\d+)/)
# => [["2015"]]

Non regex way:

"Temporada 2015".split.detect{|e| e.to_i.to_s == e }
# => "2015"
"Temporada 8".split.detect{|e| e.to_i.to_s == e }
# => "8"

Comments

0

I'd write it thus:

r = /
    \b    # match a word-break (possibly beginning of string)
    Tempo # match these characters
    \D+   # match one or more characters other than digits
    \K    # forget everything matched so far
    \d+   # match one or more digits
   /x

"Temporada 2015"[r] #=> 2015
"Temporada 8"[r]    #=> 8
"Temporary followed by something else 21 then more"[r]
  #=> 21

If 'Tempo' must be at the beginning of the string, write r = /Tempo.... or r = /\s*Tempo... if it can be preceded by whitespace. I've written \D+ rather than \D* on the assumption that there should be at least one space.

I don't understand why 'Tempo' is in a capture group. Have I missed something?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.