0

I am trying to get text from a URL. Can anyone help me.

$news1 = "http://www.espncricinfo.com/icc-womens-world-cup-2013/content/story/604808.html";     
$a=preg_match_all("/\<p class\=['\"]news-body['\"]\>(.*?)\<\/p\>/",$news1,$b);
echo $a;
print_r($b[1]);

It return the 0 Array(). If anyone can help that would be appreciated.

Some HTML below:

<p class="news-body">
New Zealand captain, Suzie Bates, also spoke of how the sides had played a competitive         game but said intensity levels weren't the same after the dispiriting news came in. Bates felt    it would have been better to have not known the result of the other match. 
</p>
<p class="news-body">
It was a particularly shattering end for the holders England, who went out of the            tournament without having had a single really poor game. Their defeats to Sri Lanka and Australia were by one wicket - off the last ball - and two runs. Edwards, however, refused to offer any excuses and said England had paid for their "slow start" to the tournament, beginning with the shock loss to Sri Lanka.
</p>
<p class="news-body">
"We had come here to win this tournament and we haven't. We haven't even got to the final," Edwards said. "That is disappointing for us as a group of players. We were very inconsistent in the first phase of the tournament and are probably now playing our best cricket, which is too late. We prepared well. We have no excuses. We didn't play well. We didn't hold our catches against Sri Lanka."
</p>
3
  • What is the 0 Array()? Commented Feb 14, 2013 at 3:26
  • You're also missing $news1 and $b in your code Commented Feb 14, 2013 at 3:27
  • O Array() what I am getting I should be getting all the text Explosion Pills. @NickFury news1 was typo and I am assuming b is output do I need to define b?? Commented Feb 14, 2013 at 3:32

1 Answer 1

2
// Fetch the content
$html = file_get_contents('http://www.espncricinfo.com/icc-womens-world-cup-2013/content/story/604808.html');

// Load the HTML into DOM
$libxml_use_internal_errors = libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);
libxml_use_internal_errors($libxml_use_internal_errors); // note this may ruin any custom error handlers

// Load the DOM into SimpleXML
$simple = simplexml_import_dom($dom);

// Xpath the document
$news = $simple->xpath('//p[@class="news-body"]');

// Echo the results
foreach($news as $p)
{
  echo "<p>$p</p>";
}
Sign up to request clarification or add additional context in comments.

1 Comment

Actually, a simple at symbol in the front of @$dom->loadHTML($html) might be enough rather than getting the state of libxml_use_internal_errors() and resetting it to suppress html parse errors.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.