0

I have mysql table hravaj00_dily and there are columns part_id, img150, imgfull. In img150 and imgfull are stored urls to images. This table is updated from xml feed btw.

Is there any PHP solution to go through column img150 (or imgfull), check if url exists (404 error) and delete from database all these rows with non existing urls..?

I have read about this function below which checks http header of url. Is this somehow useful? I have no idea how exactly to use it.

function file_external_exists($url) 
{ 
    $headers = @get_headers($url); 
    if(preg_match("|200|",$headers[0])) 
    return(true); 
    else return(false); 
}
2

2 Answers 2

2
$con=mysqli_connect("example.com","peter","abc123","my_db");
$result = mysqli_query($con,"SELECT * FROM hravaj00_dily");

while($row = mysqli_fetch_array($result)) {
  $url = $row['img150'];
  if(!urlExists($url)) {
    $nonExistent[] = $row['id']; // Assuming you have primary key
  }
}

if($nonExistent) {
  $nonExistentCSV = implode(",", $nonExistent);
  $delQuery = "DELETE FROM hravaj00_dily WHERE id IN " . $nonExistentCSV;
  mysqli_query($con, $delQuery);
}


mysqli_close($con);

// Ref: http://stackoverflow.com/questions/408405/easy-way-to-test-a-url-for-404-in-php
function urlExists($url) {
  $handle = curl_init($url);
  curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);

  $response = curl_exec($handle);

  $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
  if($httpCode == 200) {
    curl_close($handle);
    return true;
  }
  curl_close($handle);
  return false;
}
  • I am reading all the rows and making curl request to check if it exists. once all the urls are checks i am updating it at once.
  • Its better to run low number of database queries, and its always best to not run query inside a loop. You may consider running queries in batch of 100 or 1000 inside a loop.
  • You might want to sleep for some time in between using sleep() function, otherwise if image server is overloaded it might block your request.
  • You might not want to check all at once, its better to get few rows like 100 or 1000 based on server capability.
  • You might have to check if runtime for this php is more that 30 secs (which is default value n php.ini
  • You might have to increase max memory allocated for executing of php script in php.ini
Sign up to request clarification or add additional context in comments.

Comments

0
  1. Get all records
  2. Iterate over them
  3. For each record call this function to check, if it exists
  4. If so, then delete record by that ID

2 Comments

I wouldn't do that, I would either update all of them at once or update in batch of 10, 50 or 100. To make sure database is not loaded with too many request.
So then collect IDs and do request on the end.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.