Converting html to url scraper

Question

So a very helpful guy as helped me get this far on Stackoverflow however I need to covert his code from HTMl to a URL to scrape I've tried over and over and I keep hitting errors any ideas?

function getElementByIdAsString($html, $id, $pretty = true) {
$doc = new DOMDocument();
@$doc->loadHTML($html);

if(!$doc) {
    throw new Exception("Failed to load $url");
}
$element = $doc->getElementById($id);
if(!$element) {
    throw new Exception("An element with id $id was not found");
}

// get all object tags
$objects = $element->getElementsByTagName('object'); // return node list

// take the the value of the data attribute from the first object tag
$data = $objects->item(0)->getAttributeNode('data')->value;

// cut away the unnecessary parts and return the info
return substr($data, strpos($data, '=')+1);

}

// call it:
$finalcontent = getElementByIdAsString($html, 'mainclass');

print_r ($finalcontent);

It just blanks out. is there a better way for me to get the errors? New to all this — Jamie
– Jamie, Commented Nov 19, 2015 at 17:13
I'm simply trying to place a URL to scrape rather then the $html example the guy did on stack overflow — Jamie
– Jamie, Commented Nov 19, 2015 at 17:13
First, remove the @ as this will silence errors (avoid using it, really). Then add error_reporting(E_ALL); to report all errors. — camelCase
– camelCase, Commented Nov 19, 2015 at 17:15
Only error I'm getting is in the Chrome console "Failed to load resource: the server responded with a status of 500 (Internal Server Error)" Its not loading my wordpress footer so I assume its just causing errors during the scrape. — Jamie
– Jamie, Commented Nov 19, 2015 at 17:17

Elijah · Accepted Answer · 2015-11-19 17:26:38Z

1

Remember to try and catch when you use your function as it is likely to throw Exceptions which will cause a 500 Server error.

$finalcontent = getElementByIdAsString($html, 'mainclass');

should become

try {
    $finalcontent = getElementByIdAsString($html, 'mainclass');
}catch(Exception $e){
    echo $e->getMessage();
}

answered Nov 19, 2015 at 17:26

Elijah

667 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Jamie Over a year ago

Thank you so much this has removed the error! Now for the main problem. I need this to be scraping from a URL how can I convert this chunk of code to read a URL rather than $html that it is currently doing.

Elijah Over a year ago

Depending on what hosting you have, you should be able to call $html = file_get_contents($url); that will take the URL you provide and try to fetch the HTML of that document, if that doesn't work you will probably have to look into cURL and you can fetch the HTML of the page that way!

Jamie Over a year ago

I assume by the fact its now white screened this won't work with wordpress on a custom linode?

Jamie Over a year ago

Weird i remove the whole script and simple just put in a $html = file_get_contents ('url.com') and echo it out and it worked fine however with the whole function it causes a error

Jamie Over a year ago

I HAVE GOT THE OUTPUT!

|

Collectives™ on Stack Overflow

Converting html to url scraper

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related