14

I trying to get the "link" elements from certain webpages. I can't figure out what i'm doing wrong though. I'm getting the following error:

Severity: Warning

Message: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 536

Filename: controllers/test.php

Line Number: 34

Line 34 is the following in the code:

      $dom->loadHTML($html);

my code:

            $url = "http://www.amazon.com/";

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    if($html = curl_exec($ch)){

        // parse the html into a DOMDocument
        $dom = new DOMDocument();

        $dom->recover = true;
        $dom->strictErrorChecking = false;

        $dom->loadHTML($html);

        $hrefs = $dom->getElementsByTagName('a');

        echo "<pre>";
        print_r($hrefs);
        echo "</pre>";

        curl_close($ch);


    }else{
        echo "The website could not be reached.";
    }
2

3 Answers 3

42

It means some of the HTML code is invalid. THis is just a warning, not an error. Your script will still process it. To suppress the warnings set

 libxml_use_internal_errors(true);

Or you could just completely suppress the warning by doing

@$dom->loadHTML($html);
Sign up to request clarification or add additional context in comments.

6 Comments

Are you sure you set libxml_use_internal_errors(true); at the top of the php script? I also updated my answer to provide another alternative
that hides the warning, but it's returning an empty object
That is weird. I ran your exact code and it worked fine. It returned a bunch of objects. Your print_r statement outputted DOMNodeList Object ( [length] => 81 )
-1 For suggesting suppression of all errors on that line. This will lead to a debugging nightmare. I would have given you a +1 if it were not for that.
this is an awful solution, NEVER do this....if you want to supress error output to the browser, you could do something like ob_start();...commands here....$buf=ob_get_clean() and then check $buf for any error output which will let you keep the errors, but stop the browser output
|
15

This may be caused by a rogue & symbol that is immediately succeeded by a proper tag. As otherwise you would receive a missing ; error. See: Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,.

The solution is to - replace the & symbol with &amp;
or if you must have that & as it is then, may be you could enclose it in: <![CDATA[ - ]]>

1 Comment

In my case I outputted a variable containing an ampersand between <td> tags i.e. $variable['ingredient'] = "7 & 8"; $tbody .= "<td>" . $variable['ingredient'] . "</td>"; which lead to this error: Message: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity.
2

The HTML is poorly formed. If formed poorly enough loading the HTML into the DOM Document might even fail. If loadHTML is not working then suppressing the errors is pointless. I suggest using a tool like HTML Tidy to "clean up" the poorly formed HTML if you are unable to load the HTML into the DOM.

HTML Tidy can be found here http://www.htacg.org/tidy-html5/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.