1

using php library simple_html_dom i'm looping through a list of urls as dom and for each of these i try to find a string, if i find it i save the url in an array otherwise i go to the next cycle, returning the urls array at the end. The script takes something of the order of some sec for each url. after some loop the script get stuck on the $dom->load($url) line inside file get html throwing a segmentation fault, the number of loops varies on different urls lists. I tried to isolate the call at load($url) in a test script working only on the url in which the looping script get stuck but the test script end with no errors (but i can't check the print_r of the dom because my firefox crashes if i try to view page source). I'm working on a LAMP server. Here is the code:

error_reporting(E_ALL);
ini_set("max_execution_time", "300");
ini_set("memory_limit", "512M");
ini_set('output_buffering', 0);
ini_set('implicit_flush', 1);
ob_end_flush();
ob_start();
set_time_limit(100);

$urlArray = array();

foreach($urlArray as $url){
    $found = false;
    $dom = file_get_html($url);
    foreach(( $dom->find('target')) as $caught){
        array_push($link, $caught);
        $found = true
    }
    if($trovato){
        return $link;
    }else{
        echo "not found";
    }
}

thx for any help

2
  • So you are getting both a segmentation fault in PHP and a crash in Firefox? That's what I call a bad day... (BTW, that's not your real code, is it?) Commented Oct 4, 2012 at 14:29
  • The problem is the segmentation fault, i think firefox crashes just cause displaying the whole domDocument is just too much data and i don't rly need to do it so it's not much significant Commented Oct 4, 2012 at 14:50

1 Answer 1

6

Well its common problem, here is a bug http://sourceforge.net/p/simplehtmldom/bugs/103/. Add this lines before your if statement:

$dom->clear();
unset($dom);

Mostly you will not see any segfaults after that. But if you parse several thousands urls (like me :)) than you might meet it again. So my solution is - open simple_html_dom.php file, and comment all lines between 146 and 149.

 function clear()
 {
   /*
   $this->dom = null;
   $this->nodes = null;
   $this->parent = null;
   $this->children = null;
   */
 }

UPDATE: also if you comment this lines - your memory consumption will increase each parsing iteration

Sign up to request clarification or add additional context in comments.

2 Comments

Instead of commenting those lines, as per the bug, it says to unset those variables. More info: sourceforge.net/p/simplehtmldom/bugs/103
@Devaroop, yes! But it didn`t help me i still got segfault after unset. Commenting is the only way that help me with this problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.