0

I am trying to get the text between two tag.

<b> foo</b>bar<br/> => bar

I tried using '<b>asdasd</b>qwe<br/>'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*)<br\/>/) and it gives me proper result.

but when I try this :

'<b>exclude</b>op1<br/>exclude 2<b>exclude</b>op2<br/>exclude 2<b>exclude</b>op3<br/>exclude 2'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*)<br\/>/) { |ele|
puts ele
}

It matches the first <b> tag and the last <br/> tag and returns the whole string I was expecting an array of matches

1

2 Answers 2

9

Instead of using regex on html use nokogiri:

Nokogiri::HTML.fragment(str).css('b').each do |b|
    puts b.next.text
end
Sign up to request clarification or add additional context in comments.

Comments

8

Change (.*) to (.*?) to make it ungreedy

/<b>[a-zA-Z0-9]*<\/b>(.*?)<br\/>/

Test

[2] pry(main)> '<b>exclude</b>op1<br/>exclude 2<b>exclude</b>op2<br/>exclude 2<b>exclude</b>op3<br/>exclude 2'.scan(/<b>[a-zA-Z0-9]*<\/b>(.*?)<br\/>/) { |ele|
[2] pry(main)*   puts ele
[2] pry(main)* }  
op1
op2
op3

1 Comment

You cannot parse HTML with regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.