I'd like to get all articles from the webpage, as well as get all pictures for the each article.
I decided to use PHP Simple HTML DOM Parse and I used the following code:
<?php
include("simple_html_dom.php");
$sitesToCheck = array(
array(
'url' => 'http://googleblog.blogspot.ru/',
'search_element' => 'h2.title a',
'get_element' => 'div.post-content'
),
array(
// 'url' => '', // Site address with a list of of articles
// 'search_element' => '', // Link of Article on the site
// 'get_element' => '' // desired content
)
);
$s = microtime(true);
foreach($sitesToCheck as $site)
{
$html = file_get_html($site['url']);
foreach($html->find($site['search_element']) as $link)
{
$content = '';
$savePath = 'cachedPages/'.md5($site['url']).'/';
$fileName = md5($link->href);
if ( ! file_exists($savePath.$fileName))
{
$post_for_scan = file_get_html($link->href);
foreach($post_for_scan->find($site["get_element"]) as $element)
{
$content .= $element->plaintext . PHP_EOL;
}
if ( ! file_exists($savePath) && ! mkdir($savePath, 0, true))
{
die('Unable to create directory ...');
}
file_put_contents($savePath.$fileName, $content);
}
}
}
$e = microtime(true);
echo $e-$s;
I will try to get only articles without pictures. But I get the response from the server
"Maximum execution time of 120 seconds exceeded"
.
What I'm doing wrong? Is there any other way to get all the articles and all pictures for each article for a specific webpage?
set_time_limit(0)however it's not a good practice to use it everywhere. That'll prevent php from killing your process if it exceeds the maximum execution time (120 s in your case), and it will run until it finishes. The problem is... if you make a mistake in your program that causes to run forever, your program will sit in the server consuming resources until manual action is taken.