I'm mining data from site, but there it paginator, but I need to get all pages. Link to the next page is written in link tag with rel=next. If there are no more pages, the link tag is missing. I created function called getAll which should call self again and again until there is the link tag.
function getAll($url, &$links) {
$dom = file_get_html ($url); // create dom object from $url
$tmp = $dom->find('link[rel=next]', 0); // find link rel=next
if(is_object($tmp)){ // is there the link tag?
$link = $tmp->getAttribute('href'); // get url of next page - href attribute
$links[] = $link; // insert url into array
getAll($link, $links); // call self
}else{
return $links; // there are no more urls, return the array
}
}
// usage
$links = array();
getAll('http://www.zbozi.cz/vyrobek/apple-iphone-5/', $links);
print_r($links); // dump the links
But I have a problem, when I run the script the message "No data received" appear in Chrome. I don't have any idea about error or something. The function should works, because when I don't use it again it-self it returns one link - to the second page.
I think the problem is in bad syntax or bad pointer usage.
Could you please help me?
getAll, you callgetLinksinside and initially.