0

Hey guys, so I'm making a script to featch words/results off of this site (http://grecni.com/texttwist.php), So I already have the http request post ready, ect.

Only thing I need now is to fetch out the words, So I'm working with an html source that looks like so:

<html>
<head>
<title>Text Twist Unscrambler</title>
<META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,php">
</head>
<body>

<font face="arial,helvetica" size="3">
<p>
<b>3 letter words</b><br>sae &nbsp; sac &nbsp; ess &nbsp; aas &nbsp; ass &nbsp; sea &nbsp; ace &nbsp; sec &nbsp; <p>

<b>4 letter words</b><br>cess &nbsp; secs &nbsp; seas &nbsp; ceca &nbsp; sacs &nbsp; case &nbsp; asea &nbsp; casa &nbsp; aces &nbsp; caca &nbsp; <p>

<b>5 letter words</b><br>cacas &nbsp; casas &nbsp; caeca &nbsp; cases &nbsp; <p>
<b>6 letter words</b><br>access &nbsp; <br><br>
Found 23 words in 0.22962 seconds


<form action="texttwist.php" method="post">

enter scrambled letters and I'll return all word combinations<br>
<input type="text" name="l" value="asceacas" size="20" maxlength="20">

<input type="submit" name="button" value="unscramble">
<input type="button" name="clear" value="clear" onClick="this.form.l.value='';">
</form><p>

<a href=texttwist.phps>php source</a>
- it's kinda ugly, but it's fast<p>

<a href=/>back to my page</a>

</body>

</html>

I'm trying to fetch the words like "sae", "sav", "secs", "seas", "casas", ect.

Any help?

This is the farthest i've gotten, don't know what to do from here.: link text

Any suggestions? Help?

1

2 Answers 2

1

Use a HTML parser like Nokogiri.

Sign up to request clarification or add additional context in comments.

Comments

0

If you want any kind of robustness you really want a parser, as mentioned by Adrian, Nokogiri is most popular solution.

If you insist, aware of the madness that you may be in for as the page becomes more complex the following may help:

Search for a line that matches

/^<b>\d+ letter words/

and then you can dig out the bits like so:

a = line.split(/<br>/)[1] # the second half
a.gsub!('<p>', '') # take out the trailing <p>
res = a.split(' &nbsp; ')# this is your data

That being said, this isn't anything you want in production code. You'll be surprised how learning a parser will change how you see this problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.