Help with regex / ruby

Question

Hey guys, so I'm making a script to featch words/results off of this site (http://grecni.com/texttwist.php), So I already have the http request post ready, ect.

Only thing I need now is to fetch out the words, So I'm working with an html source that looks like so:

<html>
<head>
<title>Text Twist Unscrambler</title>
<META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,php">
</head>
<body>

<font face="arial,helvetica" size="3">
<p>
<b>3 letter words</b><br>sae &nbsp; sac &nbsp; ess &nbsp; aas &nbsp; ass &nbsp; sea &nbsp; ace &nbsp; sec &nbsp; <p>

<b>4 letter words</b><br>cess &nbsp; secs &nbsp; seas &nbsp; ceca &nbsp; sacs &nbsp; case &nbsp; asea &nbsp; casa &nbsp; aces &nbsp; caca &nbsp; <p>

<b>5 letter words</b><br>cacas &nbsp; casas &nbsp; caeca &nbsp; cases &nbsp; <p>
<b>6 letter words</b><br>access &nbsp; <br><br>
Found 23 words in 0.22962 seconds


<form action="texttwist.php" method="post">

enter scrambled letters and I'll return all word combinations<br>
<input type="text" name="l" value="asceacas" size="20" maxlength="20">

<input type="submit" name="button" value="unscramble">
<input type="button" name="clear" value="clear" onClick="this.form.l.value='';">
</form><p>

<a href=texttwist.phps>php source</a>
- it's kinda ugly, but it's fast<p>

<a href=/>back to my page</a>

</body>

</html>

I'm trying to fetch the words like "sae", "sav", "secs", "seas", "casas", ect.

Any help?

This is the farthest i've gotten, don't know what to do from here.: link text

Any suggestions? Help?

You need to take a look at this question: stackoverflow.com/questions/1732348/… — Paul Rubel
– Paul Rubel, Commented Jul 31, 2010 at 23:42

Adrian · Accepted Answer · 2010-07-31 23:06:14Z

1

Use a HTML parser like Nokogiri.

answered Jul 31, 2010 at 23:06

Adrian

15.2k9 gold badges51 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Paul Rubel · Accepted Answer · 2010-07-31 23:54:05Z

0

If you want any kind of robustness you really want a parser, as mentioned by Adrian, Nokogiri is most popular solution.

If you insist, aware of the madness that you may be in for as the page becomes more complex the following may help:

Search for a line that matches

/^<b>\d+ letter words/

and then you can dig out the bits like so:

a = line.split(/<br>/)[1] # the second half
a.gsub!('<p>', '') # take out the trailing <p>
res = a.split(' &nbsp; ')# this is your data

That being said, this isn't anything you want in production code. You'll be surprised how learning a parser will change how you see this problem.

answered Jul 31, 2010 at 23:54

Paul Rubel

27.4k7 gold badges64 silver badges85 bronze badges

Collectives™ on Stack Overflow

Help with regex / ruby

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related