0

So a very helpful guy as helped me get this far on Stackoverflow however I need to covert his code from HTMl to a URL to scrape I've tried over and over and I keep hitting errors any ideas?

function getElementByIdAsString($html, $id, $pretty = true) {
$doc = new DOMDocument();
@$doc->loadHTML($html);

if(!$doc) {
    throw new Exception("Failed to load $url");
}
$element = $doc->getElementById($id);
if(!$element) {
    throw new Exception("An element with id $id was not found");
}

// get all object tags
$objects = $element->getElementsByTagName('object'); // return node list

// take the the value of the data attribute from the first object tag
$data = $objects->item(0)->getAttributeNode('data')->value;

// cut away the unnecessary parts and return the info
return substr($data, strpos($data, '=')+1);

}

// call it:
$finalcontent = getElementByIdAsString($html, 'mainclass');

print_r ($finalcontent);
7
  • You mention errors...what are they? Commented Nov 19, 2015 at 17:10
  • It just blanks out. is there a better way for me to get the errors? New to all this Commented Nov 19, 2015 at 17:13
  • I'm simply trying to place a URL to scrape rather then the $html example the guy did on stack overflow Commented Nov 19, 2015 at 17:13
  • First, remove the @ as this will silence errors (avoid using it, really). Then add error_reporting(E_ALL); to report all errors. Commented Nov 19, 2015 at 17:15
  • Only error I'm getting is in the Chrome console "Failed to load resource: the server responded with a status of 500 (Internal Server Error)" Its not loading my wordpress footer so I assume its just causing errors during the scrape. Commented Nov 19, 2015 at 17:17

1 Answer 1

1

Remember to try and catch when you use your function as it is likely to throw Exceptions which will cause a 500 Server error.

$finalcontent = getElementByIdAsString($html, 'mainclass');

should become

try {
    $finalcontent = getElementByIdAsString($html, 'mainclass');
}catch(Exception $e){
    echo $e->getMessage();
}
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you so much this has removed the error! Now for the main problem. I need this to be scraping from a URL how can I convert this chunk of code to read a URL rather than $html that it is currently doing.
Depending on what hosting you have, you should be able to call $html = file_get_contents($url); that will take the URL you provide and try to fetch the HTML of that document, if that doesn't work you will probably have to look into cURL and you can fetch the HTML of the page that way!
I assume by the fact its now white screened this won't work with wordpress on a custom linode?
Weird i remove the whole script and simple just put in a $html = file_get_contents ('url.com') and echo it out and it worked fine however with the whole function it causes a error
I HAVE GOT THE OUTPUT!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.