0

What I am trying to do is scrape a page on Trip Advisor - I have what I need from the first page and then I do another loop to get the contents from the next page but when I try and add these details to the existing array it doesn't work for some reason.

error_reporting(E_ALL);
include_once('simple_html_dom.php');

$html = file_get_html('http://www.tripadvisor.co.uk/Hotels-g186534-c2-Glasgow_Scotland-Hotels.html');

$articles = '';

// Find all article blocks
foreach($html->find('.listing') as $hotel) {
    $item['name']     = $hotel->find('.property_title', 0)->plaintext;
    $item['link']     = $hotel->find('.property_title', 0)->href;

    $item['rating']    = $hotel->find('.sprite-ratings', 0)->alt;
    $item['rating']    = explode(' ', $item['rating']);
    $item['rating']    = $item['rating'][0];

    $articles[] = $item;
}

foreach($articles as $article) {

    echo '<pre>';
    print_r($article);
    echo '</pre>';

   $hotel_html = file_get_html('http://www.tripadvisor.co.uk'.$article['link'].'/');

   foreach($hotel_html->find('#MAIN') as $hotel_page) {
       $article['address']            = $hotel_page->find('.street-address', 0)->plaintext;
       $article['extendedaddress']    = $hotel_page->find('.extended-address', 0)->plaintext;
       $article['locality']           = $hotel_page->find('.locality', 0)->plaintext;
       $article['country']            = $hotel_page->find('.country-name', 0)->plaintext;

       echo '<pre>';
       print_r($article);
       echo '</pre>';

       $articles[] = $article;
    }
}

echo '<pre>';
print_r($articles);
echo '</pre>';

Here is all the debugging output that I get: http://pastebin.com/J0V9WbyE

URL: http://www.4playtheband.co.uk/scraper/

2
  • Better use SimpleXML or DomDocument. Just saying. I know it might sound lame because you don't ask for that. So I'm silent now. Commented Aug 13, 2012 at 21:02
  • The problem with using an XML library for web-scraping is that it will be intolerant of any markup that's invalid XML, which is likely even if the site professes to be XHTML. simple_html_dom parses in a more browser-like "tag soup" fashion, so makes much more robust scrapers. Commented Aug 20, 2012 at 14:54

1 Answer 1

1

I would change

$articles = '';

to:

$articles = array();

Before foreach():

$articlesNew = array();

When iterating over the array, insert in the new array

$articlesNew[] = $article;

At the end merge the arrays

$articles = array_merge($articles, $articlesNew);

Source: http://php.net/manual/en/function.array-merge.php for more array php merge / combine.

I never tried to alter an array when already iterating through it in PHP, but if you did this with C++ collections improperly it would crash unless you treat fatal exceptions. My wild guess is that you shouldn't alter the array while iterating it. I know i would never do that. Work with another variable.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.