4

The following code is in a loop. Each loop changes URL to a new address. My problem is that each pass takes up more and more memory.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://site.ru/');
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, 'http://site.ru/');
curl_setopt($ch, CURLOPT_HEADER, false);

$html = new \DOMDocument();
$html->loadHTML(curl_exec($ch));

curl_close($ch);
$ch = null;

$xpath = new \DOMXPath($html);
$html = null;

foreach ($xpath->query('//*[@id="tree"]/li[position() > 5]') as $category) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $xpath->query('./a', $category)->item(0)->nodeValue);
    curl_setopt($ch, CURLOPT_TIMEOUT, 60);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, 'http://site.ru/');
    curl_setopt($ch, CURLOPT_HEADER, false);

    $html = new \DOMDocument();
    $html->loadHTML(curl_exec($ch));

    curl_close($ch);
    $ch = null;

    // etc.
}

The memory is 2000 Mb. Script execution time ~ 2h. PHP version 5.4.4. How to avoid memory leak? Thanks!

3
  • why don't you use functions ? for good understanding and practice Commented Nov 1, 2014 at 14:05
  • I tried this with ini_set('memory_limit' ,'2GB'); and got success in PHP 5.6.0. What is your PHP version? Commented Nov 1, 2014 at 15:33
  • 5.3.* seems to use large memory. Commented Nov 1, 2014 at 15:34

2 Answers 2

4

Stories from the internet indicate that curl_setopt($ch, CURLOPT_RETURNTRANSFER, true) is broken in for some PHP/cURL versions:

You can also find stories for DOM:

Create a minimal test case which spots the cause of the leak. I.e. remove the unrelated package (DOM or cURL) from the code.

Then reproduce it with the latest PHP version. If it's still causing the leak, file a bug report else use that PHP version.

Sign up to request clarification or add additional context in comments.

Comments

3

Reuse the same curl handle instead of creating and destroying it each time in your loop.

$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, 'http://site.ru/');
curl_setopt($ch, CURLOPT_HEADER, false);
foreach ($pages as $url) {
    curl_setopt($ch, CURLOPT_URL, $url);
    $response = curl_exec($ch);
}
curl_close($ch);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.