2

Is it possible to use a foreach loop to scrape multiple URL's from an array? I've been trying but for some reason it will only pull from the first URL in the array and the show the results.

include_once('../../simple_html_dom.php');

$link = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);


foreach ($link as $links) {

function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);

$values = array(); 
foreach($html->find('input') as $element) {     
$values[$element->id=='ASIN'] = $element->value; }  


// get title
$ret['ASIN'] =  end($values);

// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;

$ret['Retail'] =$html->find('b[class="priceLarge"]', 0)->innertext;

// clean up memory
//$html->clear();
   // unset($html);

return $ret;
}



// -----------------------------------------------------------------------------
// test it!



$ret = scraping_IMDB($links);



foreach($ret as $k=>$v)

    echo '<strong>'.$k.'</strong>'.$v.'<br />';

}

Here is the code since the comment part didn't work. :) It's very dirty because I just edited one of the examples to play with it to see if I could get it to do what I wanted.

5
  • What have you tried? It would be much easier to help you if you showed us your code. Commented Jun 16, 2011 at 4:09
  • 3
    PHP's like zombo.com... everything is possible. Commented Jun 16, 2011 at 4:11
  • This is what I am working with right now but am stuck trying to get it to repeat the loop. Commented Jun 16, 2011 at 4:18
  • That didn't work here. I added the code above. Commented Jun 16, 2011 at 4:19
  • There's something wrong with your code, or you copy-pasted something wrong.. You defined a function inside a foreach loop? The next loop will give an error, because a function with that name is allready defined. Commented Jun 16, 2011 at 5:52

2 Answers 2

3
include_once('../../simple_html_dom.php');

function scraping_IMDB($links) {
    // create HTML DOM
    $html = file_get_html($links);

// What is this spaghetti code good for?    
/*
    $values = array(); 
    foreach($html->find('input') as $element) {     
        $values[$element->id=='ASIN'] = $element->value;
    }  

    // get title
    $ret['ASIN'] = end($values);
*/
    foreach($html->find('input') as $element) {
        if($element->id == 'ASIN') {
             $ret['ASIN'] = $element->value;
        }
    }

// Our you could use the following instead of the whole foreach loop above
//
// $ret['ASIN'] = $html->find('input[id="ASIN"]', 0)->value;
//
// if the 0 means, return first found or something similar,
// I just had a look at Amazons source code, and it contains 
// 2 HTML tags with id='ASIN'. If they were following html-regulations
// then there should only be ONE element with a specific id.

    // get rating
    $ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;

    $ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;

    // clean up memory
    //$html->clear();
    // unset($html);

    return $ret;
}



// -----------------------------------------------------------------------------
// test it!

$links = array (
    'http://www.amazon.com/dp/B0038JDEOO/',
    'http://www.amazon.com/dp/B0038JDEM6/',
    'http://www.amazon.com/dp/B004CYX17O/'
);

foreach ($links as $link) {
    $ret = scraping_IMDB($link);
    foreach($ret as $k=>$v) {
        echo '<strong>'.$k.'</strong>'.$v.'<br />';
    }
}   

This should do the trick

I have renamed the array to 'links' instead of 'link'. It's an array of links, containing link(s), therefore, foreach($link as $links) seemed wrong, and I changed it to foreach($links as $link)

Sign up to request clarification or add additional context in comments.

Comments

0

I really need to ask this question as it will answer way more questions after the world reads this thread. What if ... you used articles like the simple html dom site.

$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;

$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;

return $ret;

}


$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);

foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
    echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
} 

what if its $articles?

$articles[] = $item;    

}
//print_r($articles); 

$links = array (
'http://link1.com',
'http://link2.com',
'http://link3.com'
);

what would this area look like?

foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
    echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
} 

Ive seen this multiple links all over stackoverflow for past 2 years, and I still cannot figure it out. Would be great to get the basic handle on it to how the simple html dom examples are.

thx.

First time postin im sure I broke a bunch of rules and didnt do the code section right. I just had to ask this question badly.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.