Fastest way to handle undefined array key

Question

In a very tight loop I need to access tens of thousands of values in an array containing millions of elements. The key can be undefined: In that case it shall be legal to return NULL without any error message:

Array key exists: return value of element.
Array key does not exist: return null.

I do know multiple solutions:

if (isset($lookup_table[$key])) {
    return $lookup_table[$key];
} else {
    return;
}

or

@return $lookup_table[$key];

or

error_reporting(0);
$return = $lookup_table[$key];
error_reporting(E_ALL);
return $return;

All solutions are far from optimal:

The first one requires 2 lookup in the B-TREE: One to check existence, another to retrieve value. That effectively doubles the runtime.
The second one uses the error suppression operator, and thus creates a massive overhead on that line.
The third one calls the error handler (that will check error_reporting setting and then display nothing) and thereby creates an overhead.

Did I miss a way to avoid error handling and yet work with a single Btree lookup?

To answer some questions:

The array caches the results of a complex calculation - to complex to be done in real time. Out of billions of possible values, only millions yield a valid result. The array looks like 1234567 => 23457, 1234999 => 74361, .... That is saved to a PHP file of several megabyte, and include_once-d at the beginning of the execution. Initial load time does not matter.

If the key is not found, it simply means that this specific value will not return a valid result. The trouble is to get this done 50k+ per second.

isset($lookup_table[$key]) will return false if $lookup_table[$key] is null ... isn't this the opposite of what you want? not sure what its performance is like, but array_key_exists will return true if the key exists but its value is null. — user428517
– user428517, Commented May 21, 2013 at 17:16
@sgroves: The example did reflect my intention. I updated the question to clarify: If key exists, return value, otherwise return null. (Value of existent keys will never be null.) array_key_exist is, frankly, slower that the isset version, so I skipped it in my examples. — Zsolt Szilagyi
– Zsolt Szilagyi, Commented May 21, 2013 at 17:22
Mike, Mathew, I updated my question to answer yours. Sean, that´s not a matter of believe. Of course native C is faster than PHP. I am simply unwilling to spend a month writing code I can write over night in PHP. :) — Zsolt Szilagyi
– Zsolt Szilagyi, Commented May 21, 2013 at 17:53
@sean: It´s 8pm in germany :) (In other words - the only constant when asking for help building a house is that you can´t use advise on how to build a boat) — Zsolt Szilagyi
– Zsolt Szilagyi, Commented May 21, 2013 at 18:01

Ja͢ck · Accepted Answer · 2024-08-26 01:36:54Z

178

First of all, arrays are not implemented as a B-tree, it's a hash table; an array of buckets (indexed via a hash function), each with a linked list of actual values (in case of hash collisions). This means that lookup times depend on how well the hash function has "spread" the values across the buckets, i.e. the number of hash collisions is an important factor.

PHP < 7

Your original statement would be the most performant code, simply because isset() is optimised as a VM opcode.

PHP >= 7

This statement is the most correct:

return array_key_exists($key, $table) ? $table[$key] : null;

Since this change was merged, it's no longer a function call (but a VM opcode) and is as performant (if not more) as isset(), but because you're accessing the array twice in the worst case (if the key exists), the performance is comparable to your original idea.

Shorthand alternative

You can also accomplish what you want with less code using the null coalescing operator (keeping in mind that it will bypass existing array element with null value, the same way isset() would):

return $table[$key] ?? null;

This should be the most performant between the three options (the third being your own suggestion).

edited Aug 26, 2024 at 1:36

answered May 21, 2013 at 17:27

Ja͢ck

174k39 gold badges269 silver badges316 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

kba Over a year ago

I was skeptical about this approach, so I generated a random array of about 5000 keys in the range of 0-100,000, and ran through the array from 0-100,000. I tried your methoed with reference, a method without reference, but still an auxiliary variable, and a simple approach without either auxiliary variable or reference. Your reference approach took more than twice as long as the other ones. The naive approach was marginally faster than the auxiliary variable approach.

Ja͢ck Over a year ago

@kba The answer mentions that it modifies the array (if the item doesn't exist), so obviously that takes time. It also mentions that without it, the first solution given by OP is likely the fastest.

kba Over a year ago

In what way does your approach modify the array? I don't see you making any changes to $tmp or $lookup_table[$key].

Ja͢ck Over a year ago

@ZsoltSzilagy When you create a reference on a non-existent variable, PHP creates the necessary structures to make it a valid reference; in this case, it creates an array entry with value null. It works well if the item will always be there of course :)

kba Over a year ago

Still, however, my tests imply that this doesn't improve the performance after all. Even after having run through every item in the array once, the lookup time is still worse. My tests could be bad.

|

DiverseAndRemote.com · Accepted Answer · 2013-05-22 19:50:07Z

7

I did some bench marking with the following code:

set_time_limit(100);

$count = 2500000;
$search_index_end = $count * 1.5;
$search_index_start = $count * .5;

$array = array();
for ($i = 0; $i < $count; $i++)
    $array[md5($i)] = $i;

$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
    $key = md5($i);
    $test = isset($array[$key]) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";

$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
    $key = md5($i);
    $test = array_key_exists($key, $array) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";


$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
    $key = md5($i);
    $test = @$array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";

$error_reporting = error_reporting();
error_reporting(0);
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
    $key = md5($i);
    $test = $array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
error_reporting($error_reporting);

$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
    $key = md5($i);
    $tmp = &$array[$key];
    $test = isset($tmp) ? $tmp : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";

and I found that the fastest running test was the one that uses isset($array[$key]) ? $array[$key] : null followed closely by the solution that just disables error reporting.

edited May 22, 2013 at 19:50

user21926

answered May 21, 2013 at 17:49

DiverseAndRemote.com

20k10 gold badges65 silver badges75 bronze badges

4 Comments

Ja͢ck Over a year ago

Even with error reporting off, the errors are still generated.

Zsolt Szilagyi Over a year ago

Correct @Jack. Although contraintuitive, it is called error_reporting off, not error_generation.

Reactgular Over a year ago

I wonder if a try/catch and discarded exception is faster than turning off error reporting.

DiverseAndRemote.com Over a year ago

warnings aren't thrown so a try/catch wont do anything of use in this case

Antonio Reyes Montufar · Accepted Answer · 2016-01-25 00:06:58Z

3

This Work for me

{{ isset($array['key']) ? $array['key']: 'Default' }}

but this is fast

{{ $array['key'] or 'Default' }}

answered Jan 25, 2016 at 0:06

Antonio Reyes Montufar

1831 silver badge2 bronze badges

2 Comments

Your Common Sense Over a year ago

The second one generates an error

Zsolt Szilagyi Over a year ago

It's about speed, not about merely "working". Plus, it seems you use a template engine, which is not related to the question.

thomasrutter · Accepted Answer · 2017-03-15 01:21:12Z

3

The @ operator and the error_reporting methods will both be slower than using isset. With both of these methods, it modifies the error reporting setting for PHP, but PHP's error handler will still be called. The error handler will check against the error_reporting setting and exit without reporting anything, however this has still taken time.

answered Mar 15, 2017 at 1:21

thomasrutter

118k31 gold badges153 silver badges170 bronze badges

Comments

Aralicia · Accepted Answer · 2013-05-21 17:42:41Z

2

Just a sudden idea that would have to be tested, but did you try using array_intersect_key() to get the existing values and a array_merge to fill() the rest ? It would remove the need of a loop to access the data. Something like that :

$searched_keys = array ('key1' => null, 'key2' => null); // the list of the keys to find

$exiting_values = array_intersect_key($lookup_table, $searched_keys);
$all_values = array_merge($searched_keys, $exiting_keys);

Please note that I did not tried it performance-wise.

answered May 21, 2013 at 17:42

Aralicia

8745 silver badges16 bronze badges

1 Comment

Zsolt Szilagyi Over a year ago

Nice approach. Unfortunately $all_values would be considered beeing a new array, thus it would build a new index.

Reactgular · Accepted Answer · 2013-05-21 17:22:34Z

1

There are two typical approaches to this.

Define defaults for an undefined key.
Check for undefined key.

Here is how to perform the first and as little code as possible.

$data = array_merge(array($key=>false),$data);
return $data[$key];

Here is how to perform the second.

return isset($data[$key]) ? $data[$key] : false;

answered May 21, 2013 at 17:22

Reactgular

55.1k24 gold badges175 silver badges219 bronze badges

4 Comments

Zsolt Szilagyi Over a year ago

Thank you. The second version with the ternary operator is a double-lookup like in my first snippet. The first version is exciting, but I have little hope: It temporaly creates a copy of a huge array, and while there is only one btree lookup, there is a complete (temporary) array to be balanced. Runtime raises from 2log(n) to 2n.

Reactgular Over a year ago

Do you have the option to re-organize this data for performance? An octree lookup would be more efficient than a linear search. You could group by the first letter of each key, and then by the second letter and so on. en.wikipedia.org/wiki/Octree

Zsolt Szilagyi Over a year ago

Great approach! PHP internally handles arrays as fully balanced trees. I assume that won´t be outperformed by an octree, but I am not that deep into algorithm analysis. Do you have experience on that comparison? Thanks!

Reactgular Over a year ago

The only way PHP could be faster is if the keys are sorted internally, and an optimized lookup is used to find a key. I've never read anything that says PHP uses optimized indexes for array keys.

Ivan Babic · Accepted Answer · 2019-10-11 15:00:29Z

1

I prefer using the isset function instead of escaping the error. I made a function to check the key exists and if not returns a default value, in the case of nested arrays you just need to add the other keys in order:

Nested array lookup:

/**
 * Lookup array value.
 *
 * @param array $array
 * @param array $keys
 * @param $defaultValue
 */
public static function array_key_lookup($array, $keys, $defaultValue)
{
    $value = $array;
    foreach ($keys as $key) {
        if (isset($value[$key])) {
            $value = $value[$key];
        } else {
            $value = $defaultValue;
            break;
        }
    }

    return $value;
}

Usage example:

$array = [
    'key1' => 'value1',
    'key2' => 'value2',
    'key3' => [
        'key3a' => 'value3a',
        'key3b' => 'value3b'
    ]
];

array_key_lookup($array, ['key3', 'key3a'], 'default')
'value3a'

array_key_lookup($array, ['key2', 'key2a'], 'default')
'default'

array_key_lookup($array, ['key2'], 'default')
'value2'

array_key_lookup($array, ['key5'], 'default')
'default'

Escaping the error:

$value = @$array[$key1][$key2] ?: $defaultValue;

answered Oct 11, 2019 at 15:00

Ivan Babic

292 bronze badges

1 Comment

Zsolt Szilagyi Over a year ago

Hey, that's a nice approach form the usability perspective, but performancewise it's actually worse than the alternatives. The function call implicitely creates an array, the foreach makes a copy of it, the first line duplicates a variable etc. It would be much faster to call $value = isset($array[$key1][$key2]) ? $array[$key1][$key2] : $defaultValue;

Reactgular · Accepted Answer · 2014-10-23 11:38:01Z

First, re-organize the data for performance by saving a new array where the data is sorted by the keys, but the new array contains a regular numeric index.

This part will be time consuming, but only done once.

 // first sort the array by it's keys
 ksort($data);

 // second create a new array with numeric index
 $tmp = new array();
 foreach($data as $key=>$value)
 {
    $tmp[] = array('key'=>$key,'value'=>$value);
 }
 // now save and use this data instead
 save_to_file($tmp);

Once that is done it should be quick to find the key using a Binary Search. Later you can use a function like this.

  function findKey($key, $data, $start, $end)
  { 
    if($end < $start) 
    { 
        return null; 
    } 

    $mid = (int)(($end - $start) / 2) + $start; 

    if($data[$mid]['key'] > $key) 
    { 
        return findKey($key, $data, $start, $mid - 1); 
    } 
    else if($data[$mid]['key'] < $key) 
    { 
        return findKey($key, $data, $mid + 1, $end); 
    } 

    return $data[$mid]['value'];
 }

To perform a search for a key you would do this.

 $result = findKey($key, $data, 0, count($data));
 if($result === null)
 {
      // key not found.
 }

If the count($data) is done all the time, then you could cache that in the file that you stored the array data.

I suspect this method will be a lot faster in performance then a regular linear search that is repeated against the $data. I can't promise it's faster. Only an octree would be quicker, but the time to build the octree might cancel out the search performance (I've experienced that before). It depends on how much searching in the data you have to do.

Collectives™ on Stack Overflow

Fastest way to handle undefined array key

To answer some questions:

8 Answers 8

PHP < 7

PHP >= 7

Shorthand alternative

15 Comments

4 Comments

2 Comments

Comments

1 Comment

4 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

To answer some questions:

8 Answers 8

PHP < 7

PHP >= 7

Shorthand alternative

15 Comments

4 Comments

2 Comments

Comments

1 Comment

4 Comments

1 Comment

Comments

Linked

Related