extract numbers within a string using regex

Question

I have a string as below

"Temporada 2015"

and also I get string as

"Temporada 8"

I need to match and extract only numbers from the string 2015 and 8. How do i do it using regex. I tried like below

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*(\d+)/)[2]

But it returned only 5 for first one instead of 2015. How do I match both and return only nos.??

michael_wu · Accepted Answer · 2015-04-22 13:37:57Z

2

The .* is "greedy". It matches as many characters as it can. So it leaves just one digit for the \d+.

If your strings are known to contain no other numbers, you can just do

.scan(/\d+/).first

otherwise you can just match non-digit

.match(/(Tempo)[^\d]*(\d+)/)[2]

answered Apr 22, 2015 at 13:37

michael_wu

16k5 gold badges66 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wiktor Stribiżew · Accepted Answer · 2015-04-22 13:41:55Z

1

You should add a ? to make the regex non-greedy:

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2];

Here is a sample program for verification.

edited Apr 22, 2015 at 13:41

answered Apr 22, 2015 at 13:37

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

4 Comments

Wiktor Stribiżew Over a year ago

Wholly my! 4 answers displayed simultaneously! :)

Wiktor Stribiżew Over a year ago

Yeah, I also had to click that box :)

Cary Swoveland Over a year ago

@shivam, maybe it had something to do with your avatar.

shivam Over a year ago

@CarySwoveland ROFL.. I think Im too lean to be mistaken as terminator. :D

Avinash Raj · Accepted Answer · 2015-04-22 13:37:45Z

1

Because .* is greedy which matches all the characters as much as possible, so that it returns you the last digit where all the previous characters are greedily matched. By turning greedy .* to non-greedy .*?, it will do a shortest possible match which inturn give you the last number.

doc.text_at('header.headerInfo > h4 > b').match(/(Tempo).*?(\d+)/)[2]

answered Apr 22, 2015 at 13:37

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Comments

shivam · Accepted Answer · 2015-04-22 13:53:08Z

1

You can scan directly for digits:

"Temporada 2015".scan(/\d+/)
# => ["2015"]
"Temporada 8".scan(/\d+/)
# => ["8"]

If you want to include Temp in regex:

"Temporada 2015".scan(/Temp.*?(\d+)/)
# => [["2015"]]

Non regex way:

"Temporada 2015".split.detect{|e| e.to_i.to_s == e }
# => "2015"
"Temporada 8".split.detect{|e| e.to_i.to_s == e }
# => "8"

edited Apr 22, 2015 at 13:53

answered Apr 22, 2015 at 13:37

shivam

16.6k3 gold badges61 silver badges72 bronze badges

Comments

Cary Swoveland · Accepted Answer · 2015-04-22 17:47:03Z

0

I'd write it thus:

r = /
    \b    # match a word-break (possibly beginning of string)
    Tempo # match these characters
    \D+   # match one or more characters other than digits
    \K    # forget everything matched so far
    \d+   # match one or more digits
   /x

"Temporada 2015"[r] #=> 2015
"Temporada 8"[r]    #=> 8
"Temporary followed by something else 21 then more"[r]
  #=> 21

If 'Tempo' must be at the beginning of the string, write r = /Tempo.... or r = /\s*Tempo... if it can be preceded by whitespace. I've written \D+ rather than \D* on the assumption that there should be at least one space.

I don't understand why 'Tempo' is in a capture group. Have I missed something?

edited Apr 22, 2015 at 17:47

answered Apr 22, 2015 at 17:42

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Collectives™ on Stack Overflow

extract numbers within a string using regex

5 Answers 5

Comments

4 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related