5

I need a unique string from an array so that I can tell when it changes without measuring the inputs of that array. I'm trying to work out if it is computationally efficient to calculate a value rather than add code to look out for changes in the array. The array itself can have a variety of values and for future proofing I don't want to try and measure whether new values have been added to the array, I'd much rather just create some string or hash that will change if the array itself changes.

So for example:

$a = Array(
'var1' => 1,
'var2' => 2,
'var3' => 3,
);

If I was to use md5(http_build_query($a)) perhaps with an added ksort to confirm that the order of the keys haven't changed this might then produce a unique string that I can use to compare against another run of the application to evaluate whether the array has changed.

I'm looking for an alternate, possibly faster or more elegant solutions to this.

1
  • I don't think array_diff would deal with checking if the key order has changed. You could json_encode the array and take a hash of that. Might want to check the performance of json_encode and http_build_query. Commented Feb 25, 2011 at 0:33

4 Answers 4

8

Im use md5(serialize($array)) for this. Its better, because works for multi-dimensional arrays.

Sign up to request clarification or add additional context in comments.

Comments

3

Thanks for all the ideas guys.

I've tried all of them except a sha-256 which my server doesn't have installed.

Here's the results:

Average (http_build_query): 1.3954045954045E-5
Average (diff): 0.00011533766233766
Average (serialize): 1.7588411588412E-5
Average (md5): 1.6036963036966E-5
Average (implode-haval160,4): 1.5349650349649E-5

That's running the operation 1000 times and averaging the result. After refreshing a couple times I could tell that the http_build_query was the quickest. I guess my next question would be if anyone can think of any pitfalls of using this method?

Thanks

Here's my code:

class a {

    static $input;

    function test() {
        $start = null;
        $s = $e = $d = $g = $h = $i = $k = array();
        self::$input = array();

        for ($x = 0; $x <= 30; $x++) {
            self::$input['variable_' . $x] = rand();
        }

        for ($x = 0; $x <= 1000; $x++) {
            $start = microtime();

            $c = http_build_query(self::$input);
            ($c == $c);

            $s[] = microtime() - $start;
        }

        for ($x = 0; $x <= 1000; $x++) {
            $start = microtime();

            $c = md5(http_build_query(self::$input));
            ($c == $c);

            $e[] = microtime() - $start;
        }

        for ($x = 0; $x <= 1000; $x++) {
            $start = microtime();

            $c = array_diff(self::$input, self::$input);

            $d[] = microtime() - $start;
        }
        for ($x = 0; $x <= 1000; $x++) {
            $start = microtime();

            $c = serialize(self::$input);
            ($c == $c);

            $g[] = microtime() - $start;
        }

        for ($x = 0; $x <= 1000; $x++) {
            $start = microtime();

            $c =  hash("haval160,4", implode(',',self::$input));
            ($c == $c);

            $h[] = microtime() - $start;
        }
        echo "<pre>";

//print_r($s);
        echo "Average (http_build_query): " . array_sum($s) / count($s) . "<br>";
        echo "Average (diff): " . array_sum($d) / count($d) . "<br>";
        echo "Average (serialize): " . array_sum($g) / count($g) . "<br>";
        echo "Average (md5): " . array_sum($e) / count($e). "<br>";
        echo "Average (implode-haval160,4): " . array_sum($h) / count($h);
    }

}

a::test();

2 Comments

damn its genious array_diff(self::$input, self::$input);
well in practical use, for this application, most of the time it will be a match so I decided that I wanted to count the time it takes to identify a match rather than identify a non match. If I am correct, it would take less time to identify a non match because it has to iterate and check every key. It would only stop checking if it found a difference, which it doesnt, so going by that logic hopefully that means I am being more conservative by comparing to of the exact same variables
2

PHP has an array_diff() function, don't know if it's of any use for you.

Otherwise, you can eventualy use the incremental hashing possibility offered by php : http://www.php.net/manual/en/function.hash-init.php by iterating over each values of the array and adding them in the incremental hash.

Comments

1

You could always just do

$str = implode(",", $a);
$check = hash("sha-256", $str);

Theoretically, that should detect changes in array size, data, or ordering.

Of course, you can use whatever hash you wish.

1 Comment

Actually, delphist is probably correct, serialize() would probably work better if you didn't know the dimension of the array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.