0

I am trying to write a crawler using simple_html_dom.php version 1.5 but it seems it leaks memory for reasons unknown. I tried the 1.5 because they claim to have fixed memory leaks help will be appreciated. after 40 repetitions of the loop i get the following message

   Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 8388608 bytes) in C:\work\simple_html_dom.php on line 1078
<?php
/**
* ******************TESTING*************************
*/




include("simple_html_dom.php");


$beginning=0;
$end=35;
$FileName = "c:/results.txt";
$FileHandle = fopen($FileName, 'w') or die("can't open file");

for ($i = $beginning; $i < $end; $i++) {

$url = sprintf('http://imgur.com/gallery/hot/day/page/%d?scrolled',$i);

$html = file_get_html($url);

echo "Day: -".$i."\n";


foreach($html -> find('div[class=posts]') as $element){




    foreach($element -> find('img') as $el)
    {
        $urls = $el-> src;
        $urls1 = str_replace('b.jpg','.jpg',$el->src);
        $urls2 =     str_replace('.jpg','',str_replace('.com/','.com/gallery/',str_replace('http://i.','http://',str_replace('b.jpg','.jpg',$el->src))));

        $title=str_replace('&quot;','"',str_replace('&#039;',"'",stristr($el-> title,'<p>',true)));
        $fil= $urls2.'             '.$urls.'             '.$urls1.'             '.$title."\n";
        fwrite($FileHandle, $fil);

    }
}

$html->clear;
unset($html);
}

fclose($FileHandle);




?>
5
  • If you indent your code, it would be easier to read. Commented Nov 26, 2011 at 16:46
  • Try to do a unset($element); just after the inner foreach. Commented Nov 26, 2011 at 16:48
  • I would consider that simple_html_dom is an outdated library that has a broken design. You should replace it with something better, there are other, better libraries available. Commented Nov 26, 2011 at 16:50
  • 3
    Suggested third party alternatives to SimpleHtmlDom that actually use DOM instead of String Parsing: phpQuery, Zend_Dom, QueryPath and FluentDom. Commented Nov 26, 2011 at 16:51
  • Thanks i will try them in other projects but since this just a part of the whole it would take some time to change the entire code to use the other libraries. Commented Nov 26, 2011 at 19:15

2 Answers 2

3
$html->clear;

if this is your actual code then you may want to change it to function call: $html->clear();

If its not the issue, try downgrading to 1.11, clear() worked there pretty well.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks using that function plus running the script in pieces with the help of a .bat file seems to do the trick
1

You could increase the memory with

ini_set("memory_limit","LIMIT"); 

for example to

ini_set("memory_limit","32M");

btw, check out: PHP Simple HTML Dom Memory Issue

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.