1

I have a PHP program that stores the HTML/JavaScript contents of a webpage in a MySQL database. The contents are obtained using cURL, and are then subjected to $mysqli->real_escape_string() before being stored in the table as a longtext.

Later, I have to once again use cURL to get the HTML/JavaScript contents of the webpage and retrieve those which I stored in the database.

At this point, I need to compare them to see whether changes have been made or not in the code. I've tried using:

if ($saved == $content)
   return true;
else
   return false;

However, this always returns false, even when no changes have been made to the code. Upon using cURL the second time, I am not escaping the string, so that isn't the issue. I've compared the two pieces of code, but I can't visually discern any differences.

How can I compare the two strings so that it will accurately return whether any change has been made or not?

I should also mention that both files execute just fine, except that the second one always returns false.

First PHP file:

$url = "www.example.com";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = $mysqli->real_escape_string(curl_exec($ch));
curl_close($ch);

$query = "INSERT INTO saved (url, content)
              VALUES ('$url', '$content')";

$mysqli->query($query)

Second PHP file:

$url = "www.google.com";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($ch);
curl_close($ch);

$query = "SELECT content
          FROM saved
          WHERE url = '$url'";

$result = $mysqli->query($query);
$data = $result->fetch_assoc();
$saved = $data['content'];

if ($saved == $content)
    echo '1';
else
    echo '0';

2 Answers 2

3

Calculate the MD5 hash of the contents and save it. Now just check the fingerprints

if($saved['fingerprint'] == $content['fingerprint'])
Sign up to request clarification or add additional context in comments.

11 Comments

Ah, this is why I come to Stackoverflow! I'll try this. It saddens me that it didn't occur to me to hash the content, but alas thank you for the suggestion! Also, mind elaborating on what the fingerprint is? Could I not just compare the two md5 hashes?
The result of md5 is sometimes called a fingerprint/checksum. @xeon: He is right, make sure you md5 the content without the headers since they will change and causes undesired results.
Could you clarify on the syntax of ['fingerprint']? Does this automatically create the md5 fingerprint, or do I still have to hash it?
$fingerprint = md5($content); "INSERT INTO saved (url, content, fingerprint) VALUES ('$url', '$content', '$fingerprint')"; Just make sure that the content is stripped of the headers, I think you can use $content['content'], If I'm not mistaken.
It will produce a different hash even for a small change. There is no threshold.
|
2

If you just need to compare content, my suggestion would be to create a checksum of the data (using MD5 or similar) and compare the data based on the checksum.

2 Comments

How would I use/what is a checksum?
$hash = sha1($content); I would also trim the content and make sure you are NOT fetching the headers also (which change).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.