PHP Simple DOM Parser to Scrape From Multiple URLs

Question

Is it possible to use a foreach loop to scrape multiple URL's from an array? I've been trying but for some reason it will only pull from the first URL in the array and the show the results.

include_once('../../simple_html_dom.php');

$link = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);


foreach ($link as $links) {

function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);

$values = array(); 
foreach($html->find('input') as $element) {     
$values[$element->id=='ASIN'] = $element->value; }  


// get title
$ret['ASIN'] =  end($values);

// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;

$ret['Retail'] =$html->find('b[class="priceLarge"]', 0)->innertext;

// clean up memory
//$html->clear();
   // unset($html);

return $ret;
}



// -----------------------------------------------------------------------------
// test it!



$ret = scraping_IMDB($links);



foreach($ret as $k=>$v)

    echo '<strong>'.$k.'</strong>'.$v.'<br />';

}

Here is the code since the comment part didn't work. :) It's very dirty because I just edited one of the examples to play with it to see if I could get it to do what I wanted.

What have you tried? It would be much easier to help you if you showed us your code. — Jordan Running
– Jordan Running, Commented Jun 16, 2011 at 4:09
This is what I am working with right now but am stuck trying to get it to repeat the loop. — Reg
– Reg, Commented Jun 16, 2011 at 4:18
There's something wrong with your code, or you copy-pasted something wrong.. You defined a function inside a foreach loop? The next loop will give an error, because a function with that name is allready defined. — Phliplip
– Phliplip, Commented Jun 16, 2011 at 5:52

Phliplip · Accepted Answer · 2011-06-16 06:17:42Z

include_once('../../simple_html_dom.php');

function scraping_IMDB($links) {
    // create HTML DOM
    $html = file_get_html($links);

// What is this spaghetti code good for?    
/*
    $values = array(); 
    foreach($html->find('input') as $element) {     
        $values[$element->id=='ASIN'] = $element->value;
    }  

    // get title
    $ret['ASIN'] = end($values);
*/
    foreach($html->find('input') as $element) {
        if($element->id == 'ASIN') {
             $ret['ASIN'] = $element->value;
        }
    }

// Our you could use the following instead of the whole foreach loop above
//
// $ret['ASIN'] = $html->find('input[id="ASIN"]', 0)->value;
//
// if the 0 means, return first found or something similar,
// I just had a look at Amazons source code, and it contains 
// 2 HTML tags with id='ASIN'. If they were following html-regulations
// then there should only be ONE element with a specific id.

    // get rating
    $ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;

    $ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;

    // clean up memory
    //$html->clear();
    // unset($html);

    return $ret;
}



// -----------------------------------------------------------------------------
// test it!

$links = array (
    'http://www.amazon.com/dp/B0038JDEOO/',
    'http://www.amazon.com/dp/B0038JDEM6/',
    'http://www.amazon.com/dp/B004CYX17O/'
);

foreach ($links as $link) {
    $ret = scraping_IMDB($link);
    foreach($ret as $k=>$v) {
        echo '<strong>'.$k.'</strong>'.$v.'<br />';
    }
}

This should do the trick

I have renamed the array to 'links' instead of 'link'. It's an array of links, containing link(s), therefore, foreach($link as $links) seemed wrong, and I changed it to foreach($links as $link)

Craig Richards · Accepted Answer · 2013-02-05 07:00:17Z

I really need to ask this question as it will answer way more questions after the world reads this thread. What if ... you used articles like the simple html dom site.

$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;

$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;

return $ret;

}


$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);

foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
    echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}

what if its $articles?

$articles[] = $item;    

}
//print_r($articles); 

$links = array (
'http://link1.com',
'http://link2.com',
'http://link3.com'
);

what would this area look like?

foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
    echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}

Ive seen this multiple links all over stackoverflow for past 2 years, and I still cannot figure it out. Would be great to get the basic handle on it to how the simple html dom examples are.

thx.

First time postin im sure I broke a bunch of rules and didnt do the code section right. I just had to ask this question badly.

Collectives™ on Stack Overflow

PHP Simple DOM Parser to Scrape From Multiple URLs

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related