1

Okay, I have a list of URLs in a MySQL table. I want the script to automatically check each link in the table for 404, and afterward I want it to store whether the URL was 404'd or not, as well as store a time for last checked.

Is this even possible to do automatically, even if no one runs the script? ie, no one visits the page for a few days, but even with no one visiting the page, it automatically ran the test.

If its possible, how could I go about making a button to do this?

4 Answers 4

2

No need to use CURL, file_get_contents($url); will return false if the request fails (any other HTTP code other than 2xx), which might be more useful for what you're trying to do, an example:

function urlExists($url)
{
    return (bool) @file_get_contents($url);
}

Will return true if the URL returns useful content, false otherwise.


EDIT: Here is a faster way (it only requests the headers) and the first byte instead of the whole page:

function urlExists($url)
{
    return (bool) @file_get_contents($url, false, null, 0, 1);
}

urlExists('https://stackoverflow.com/iDontExist'); // false

However, in combination with your other question it may be wiser to use something like this:

function url($url)
{
    return @file_get_contents($url);
}

$content = url('https://stackoverflow.com/');

// request has failed (404, 5xx, etc...)
if ($content === false)
{
    // delete or store as "failed" in the DB
}

// request was successful
else
{
    $hash = md5($content); // md5() should be enough but you can also use sha1()

    // store $hash in the DB to keep track of changes
}

Or if you're using PHP 5.1+ you only have to do:

$hash = @md5_file($url);

$hash will be false when the URL fails to load, otherwise it will return the MD5 hash of the contents.

Graciously stolen from @Jamie. =)

This way you only have to make one request instead of two. =)

Sign up to request clarification or add additional context in comments.

4 Comments

Sorry for the late late response. Been a bit busy. Thanks a lot, @md5_file(); is great. Can you tell me though, what is the '@' in front of the function for?
Rob: Is to avoid the function for throwing errors if the url fails to load.
So if I put that in my curl_multi_exec (as in @curl_multi_exec();), it will run all the urls regardless of errors? (See my open question for more information)
@Rob: The @ operator only suppresses errors for being thrown, nothing more. Regarding your other question, you're using curl_multi wrong.
1

You would use a cron job to do this. Using a cron job you pick when the script is run e.g. every hour, every 6 hours, etc...

To check 404 you can loop through the urls and use get_headers updating a status row each time.

Comments

0

Try using curl:

// $url <= The URL from your database
$curl = curl_init($url);
curl_setopt($curl,  CURLOPT_RETURNTRANSFER, TRUE);
$curl_response = curl_exec($curl);
if(curl_getinfo($curl, CURLINFO_HTTP_CODE) == 404) 
{
  // Save in database.
}
curl_close($curl);

If you are running on a shared hosting server, look for the possibility of setting up timed actions (cron jobs). Some hosting services have it, some don't.

Comments

0

I would recommend using curl as well, but make HEAD request instead of GET:

<?php
function check_url($url) {
    $c = curl_init();
    curl_setopt($c, CURLOPT_URL, $url);
    curl_setopt($c, CURLOPT_HEADER, 1); // get the header
    curl_setopt($c, CURLOPT_NOBODY, 1); // and *only* get the header
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1); // get the response as a string from curl_exec(), rather than echoing it
    curl_setopt($c, CURLOPT_FRESH_CONNECT, 1); // don't use a cached version of the url
    if (!curl_exec($c)) { return false; }

    $httpcode = curl_getinfo($c, CURLINFO_HTTP_CODE);
    return $httpcode;
}
?>

Snipplet taken from here.

Recurring execution can be achieved by *nix cron command.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.