get multiple substrings from a string in Ruby

Question

I have

tmp_body_symbols="things <st>hello</st> and <st>blue</st> by <st>orange</st>"
str1_markerstring = "<st>"
str2_markerstring = "</st>"
frags << tmp_body_symbols[/#{str1_markerstring}(.*?)#{str2_markerstring}/m, 1]

frags is "hello" but I want ["hello","blue","orange"]

How woudl I do that?

yeah, was thinking about using nokogiri but this is all that we're really capturing and seems like overkill. — timpone
– timpone, Commented Jan 28, 2015 at 5:32

tckmn · Accepted Answer · 2015-01-28 03:25:35Z

3

Use scan:

tmp_body_symbols.scan(/#{str1_markerstring}(.*?)#{str2_markerstring}/m).flatten

3 Comments

timpone Over a year ago

thx Doorknob, I updated the question from using different variable names but you got it right anyway

Cary Swoveland Over a year ago

Do you need the multiline modifier (/m)?

tckmn Over a year ago

@CarySwoveland If the tags can potentially contain multiline data (ex. <st>foo\nbar\nbaz</st>), then yes. I've just kept the regex the same as in the original question, to avoid confusion.

Gagan Gami · Accepted Answer · 2015-01-28 05:33:55Z

2

You can use Nokogiri to parse HTML/XML

require 'open-uri'
require 'nokogiri' 

doc = Nokogiri::HTML::Document.parse("things <st>hello</st> and <st>blue</st> by <st>orange</st>")
doc.css('st').map(&:text)
#=> ["hello", "blue", "orange"]

More Info : http://www.nokogiri.org/tutorials/parsing_an_html_xml_document.html

answered Jan 28, 2015 at 5:33

Gagan Gami

10.3k1 gold badge32 silver badges56 bronze badges

3 Comments

timpone Over a year ago

thx, makes sense, might do this down the road but more quick and dirty for this one off

Gagan Gami Over a year ago

It was just one line but when you need to get data from whole page/file I think this is better to use.

timpone Over a year ago

i agree - when it gets there; nokogiri would probably be a great soln

Cary Swoveland · Accepted Answer · 2015-01-28 04:15:34Z

0

You can do this with a capture group, as @Doorknob has done, or without a capture group, by using a ("zero-width") positive look-behind and positive-lookahead:

tmp = "things <st>hello</st> and <st>blue</st> by <st>orange</st>"
s1 = "<st>"
s2 = "</st>"

tmp.scan(/(?<=#{ s1 }).*?(?=#{ s2 })/).flatten
  #=> ["hello", "blue", "orange"]

(?<=#{ s1 }), which evaluates to (?<=<st>), is the positive look-behind.
(?=#{ s2 }), which evaluates to (?=</st>), is the positive look-behind.
? following .* makes it "non-greedy". Without it:

tmp.scan(/(?<=#{ s1 }).*(?=#{ s2 })/).flatten
  #=> ["hello</st> and <st>blue</st> by <st>orange"]

edited Jan 28, 2015 at 4:15

answered Jan 28, 2015 at 4:09

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Collectives™ on Stack Overflow

get multiple substrings from a string in Ruby

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related