0

When I try to parse search results from Google I get an error

code

$html = file_get_contents('http://www.google.dk/search?q='.urlencode($query).'&start=0&num=100', false, $context);
                
$doc = new DOMDocument();
$doc->loadHTML($html);

error

PHP Warning:  DOMDocument::loadHTML(): Input is not proper UTF-8, indicate encoding ! in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132

Warning: DOMDocument::loadHTML(): Input is not proper UTF-8, indicate encoding ! in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
PHP Warning:  DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
PHP Warning:  DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 1 in /var/www/dynaccount.com/class/Cronjob_check_serp_position.php on line 132
2
  • can you print_r the $html please? Commented Jul 26, 2015 at 9:19
  • possible duplicate of Dom LoadHTML Problem in PHP Commented Jul 26, 2015 at 9:32

1 Answer 1

1

libxml has some built in error handling which would help

            $query='php rocks';

            $data=file_get_contents('http://www.google.co.uk/search?q='.urlencode( $query ).'&start=0&num=100');
            libxml_use_internal_errors( true );
            $html = new DOMDocument('1.0','utf-8');
            $html->validateOnParse=false;
            $html->standalone=true;
            $html->preserveWhiteSpace=true;
            $html->strictErrorChecking=false;
            $html->substituteEntities=false;
            $html->recover=true;
            $html->formatOutput=true;
            $html->loadHTML( $data );
            $parse_errs=serialize( libxml_get_last_error() );
            libxml_clear_errors();


            $xpath=new DOMXPath( $html );
            $div=$html->getElementById('ires');
            $col=$xpath->query("ol/li/h3/a", $div );

            foreach( $col as $node ) echo $node->getAttribute('href').'<br />';

            $html=null;
            $xpath=null;
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks.. But how do I get the <div class="g"> elements inside $div ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.