1

I know a little bit of regex, but not mutch. What is the best way to get just the number out of the following html. (I want to have 32 returned). the values of width,row span, and size are all different in this horrible html page. Any help?

<td width=14 rowspan=2 align=right><font size=2 face="helvetica">32</font></td>
4
  • The best way is to use a parser not regular expressions. :-) Commented Mar 14, 2010 at 1:52
  • @Erik: In principle yes, but for quick and dirty screenscraping regex are usually a viable tool. Commented Mar 14, 2010 at 1:55
  • I would use a parser, but the HTML is too badly formatted. Commented Mar 14, 2010 at 2:02
  • Well, your example certainly is valid, though :-). And HTML parsers usually are designed to deal with erroneous markup. Commented Mar 14, 2010 at 2:06

3 Answers 3

2

How about

>(\d+)<

Or, if you desperately want to avoid using capturing groups at all:

(?<=>)\d+(?=<)
Sign up to request clarification or add additional context in comments.

2 Comments

This returns >32< but I guess I could just do string.match(/>(\d+)</).match(/\d+/)
@bun: Well, you'll find the 32 in the first capturing group ... I edited the answer to include an example which doesn't need the group, though.
2

Please, do yourself a favor:

#!/usr/bin/env ruby
require 'nokogiri'

require 'test/unit'
class TestExtraction < Test::Unit::TestCase
  def test_that_it_extracts_the_number_correctly
    doc = Nokogiri::HTML('<td width=14 rowspan=2 align=right><font size=2 face="helvetica">32</font></td>')
    assert_equal [32], (doc / '//td/font').map {|el| el.text.to_i }
  end
end

1 Comment

I agree. Going after HTML content with regex is a lot more error prone over the long term compared to using a parser.
0

May be

<td[^>]*><font[^>]*>\d+</font></td>

2 Comments

This will certainly match the string above, but won't do anything to extract the 32.
Well, if Ruby's regexp synatx is borrowed from Perl, then you need to put \d+ in parentheses. And then use match()[1]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.