2

I have used file_get_contents() to basically get the source code of a site into a single string variable.

The source contains many rows that looks like this: <td align="center"><a href="somewebsite.com/something">12345</a></td>

(and a lot of rows that don't look like that). I want to extract all the idnumbers (12345 above) and put them in an array. How can I do that? I assume I want to use some kind of regular expressions and then use the preg_match_all() function, but I'm not sure how...

2
  • 1
    we would have to see the data Commented Apr 20, 2011 at 19:44
  • Oh good Google, not another one. stackoverflow.com/questions/1732348/… Commented Apr 20, 2011 at 19:45

2 Answers 2

4

Don't mess with regular expressions. Get the variable and let a DOM library do the mundane tasks for you. Take a look at: http://sourceforge.net/projects/simplehtmldom/

Then you can traverse your HTMl like a tree and extract stuff. If you really want to get funky, read up on xPath.

Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

preg_match('/>[0-9]+<\/a><\/td>/', $str, $matches);
for($i = 0;$i<sizeof($matches);$i++)
 $values[] = $matches[$i];

1 Comment

Thanks! This gave me a basic idea, I went with preg_match_all('/[0-9]+<\/a><\/td>/', $html, $matches); return $matches[0]; Works perfetly!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.