PHP regex for DOM Manupulation

Question

I am trying to get text from a URL. Can anyone help me.

$news1 = "http://www.espncricinfo.com/icc-womens-world-cup-2013/content/story/604808.html";     
$a=preg_match_all("/\<p class\=['\"]news-body['\"]\>(.*?)\<\/p\>/",$news1,$b);
echo $a;
print_r($b[1]);

It return the 0 Array(). If anyone can help that would be appreciated.

Some HTML below:

<p class="news-body">
New Zealand captain, Suzie Bates, also spoke of how the sides had played a competitive         game but said intensity levels weren't the same after the dispiriting news came in. Bates felt    it would have been better to have not known the result of the other match. 
</p>
<p class="news-body">
It was a particularly shattering end for the holders England, who went out of the            tournament without having had a single really poor game. Their defeats to Sri Lanka and Australia were by one wicket - off the last ball - and two runs. Edwards, however, refused to offer any excuses and said England had paid for their "slow start" to the tournament, beginning with the shock loss to Sri Lanka.
</p>
<p class="news-body">
"We had come here to win this tournament and we haven't. We haven't even got to the final," Edwards said. "That is disappointing for us as a group of players. We were very inconsistent in the first phase of the tournament and are probably now playing our best cricket, which is too late. We prepared well. We have no excuses. We didn't play well. We didn't hold our catches against Sri Lanka."
</p>

O Array() what I am getting I should be getting all the text Explosion Pills. @NickFury news1 was typo and I am assuming b is output do I need to define b?? — mysteriousboy
– mysteriousboy, Commented Feb 14, 2013 at 3:32

Scuzzy · Accepted Answer · 2013-02-14 03:42:11Z

2

// Fetch the content
$html = file_get_contents('http://www.espncricinfo.com/icc-womens-world-cup-2013/content/story/604808.html');

// Load the HTML into DOM
$libxml_use_internal_errors = libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);
libxml_use_internal_errors($libxml_use_internal_errors); // note this may ruin any custom error handlers

// Load the DOM into SimpleXML
$simple = simplexml_import_dom($dom);

// Xpath the document
$news = $simple->xpath('//p[@class="news-body"]');

// Echo the results
foreach($news as $p)
{
  echo "<p>$p</p>";
}

edited Feb 14, 2013 at 3:42

answered Feb 14, 2013 at 3:30

Scuzzy

12.4k2 gold badges50 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Scuzzy Over a year ago

Actually, a simple at symbol in the front of @$dom->loadHTML($html) might be enough rather than getting the state of libxml_use_internal_errors() and resetting it to suppress html parse errors.

Collectives™ on Stack Overflow

PHP regex for DOM Manupulation

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related