3

I am working on web scraping application using simple_html_dom. I need to extract all the images in a web page. The following are the possibilities:

  1. <img> tag images
  2. if there is a css with the <style> tag in the same page.
  3. if there is an image with the inline style with <div> or with some other tag.

I can scrape all the images by using the following code.

function download_images($html, $page_url , $local_url){

    foreach($html->find('img') as $element) {
        $img_url = $element->src;
        $img_url = rel2abs($img_url, $page_url);
        $parts   = parse_url($img_url);
        $img_path=  $parts['path'];
        $url_to_be_change = $GLOBALS['website_server_root'].$img_path;
        download_file($img_url, $GLOBALS['website_local_root'].$img_path);  
        $element->src=$url_to_be_change;            
    }

    $css_inline = $html->find("style");

    $matches = array();
    preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
    foreach ( $matches as $match )    {
        $img_url = trim( $match[1], "\"'" );
        $img_url = rel2abs($img_url, $page_url);
        $parts   = parse_url($img_url);
        $img_path=  $parts['path'];
        $url_to_be_change = $GLOBALS['website_server_root'].$img_path  ;
        download_file($img_url , $GLOBALS['website_local_root'].$img_path); 
        $html = str_replace($img_url , $url_to_be_change , $html );
    }

    return $html;
}

$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);    

Please note that, I am modifying the HTML too after image downloading.

downloading is working fine. but when i am trying to save the HTML, then its giving the following error:

PHP Fatal error: Cannot use object of type simple_html_dom as array

Important: its working perfectly fine, if I am not using str_replace and second loop.

Fatal error: Cannot use object of type simple_html_dom as array in /var/www/html/app/framework/cache/includes/simple_html_dom.php on line 1167

2
  • The $html as the last argument in your str_replace call is an object, not an array. str_replace apparently doesn't like that. You need to figure out another way to represent that data as an array, or re-work it somehow. Commented Apr 30, 2015 at 12:39
  • obligatory stackoverflow.com/a/1732454/3044080 Commented Apr 30, 2015 at 13:56

3 Answers 3

2

Guess №1

I see a possible mistake here:

$html = str_get_html($html);

Looks like you pass an object to function str_get_html(), while it accepts a string as an argument. Lets fix that this way:

$html = str_get_html($html->plaintext);

We can only guess what is the content of the $html variable, that comes to this piece of code.

Guess №2

Or maybe we just need to use another variable in function download_images to make your code correct in both cases:

function download_images($html, $page_url , $local_url){

    foreach($html->find('img') as $element) {
        $img_url = $element->src;
        $img_url = rel2abs($img_url, $page_url);
        $parts   = parse_url($img_url);
        $img_path=  $parts['path'];
        $url_to_be_change = $GLOBALS['website_server_root'].$img_path  ;
        download_file($img_url , $GLOBALS['website_local_root'].$img_path); 
        $element->src=$url_to_be_change;            
    }

    $css_inline = $html->find("style");

    $result_html = "";
    $matches = array();
    preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
    foreach ( $matches as $match )    {
        $img_url = trim( $match[1], "\"'" );
        $img_url = rel2abs($img_url, $page_url);
        $parts   = parse_url($img_url);
        $img_path=  $parts['path'];
        $url_to_be_change = $GLOBALS['website_server_root'].$img_path  ;
        download_file($img_url , $GLOBALS['website_local_root'].$img_path); 
        $result_html = str_replace($img_url , $url_to_be_change , $html );
    }

    return $result_html;
}

$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);

Explanation: if there was no matches (array $matches is empty) we never go in the second cycle, thats why variable $html still has the same value as at beginning of the function. This is common mistake when you're trying to use same variable in the place of code where you need two different variables.

Sign up to request clarification or add additional context in comments.

6 Comments

line 1167 : if ($this->size>0) $this->char = $this->doc[0];
Updated my answer. Added one more solution (see Guess №2 part). Please tell me which one of those two works in all the cases.
now, its showing this error.i cant see your second solution. PHP Fatal error: Call to a member function save() on a non-object in
Ah, that is okay, look at the last two lines: $html = str_get_html ($html); here we save a string to $html variable, and the last one $html->save($dir. "/" . $ff); we are still trying to use it as an object, but it is string now! You should fix it to make your program work as intended, I can't help you, because I only know a small part of the code, not all the program. Hope this explanation will help you fix it.
i have tried the second solution, the old error has removed, but i cant save the html. here is the error: Fatal error: Call to a member function save() on a non-object in
|
0

As the error message states, you are dealing with an Object where you should have an array. You could try tpyecasting your object:

$array =  (array) $yourObject;

That should solve it.

Comments

0

I had this error, I solved it by using (in my case) return $html->save(); in end of function. I can't explain why two instances with different variable names, and scoped in different functions made this error. I guess this is how the "simple html dom" class works.

So just to be clear, try: $html->save(), before you do anything else after

I hope this information helps somebody :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.