7

I have a multidimensional array in PHP, where the outer array contains several thousands items and each item inside is an array itself with the values "key1", "key2" and "count":

 myExistingArray (size=99999 VERY BIG)
      public 0 => 
        array (size=3)
          'key1' => string '15504' 
          'key2' => string '20'
          'count' => string '1'
      public 1 => 
        array (size=3)
          'key1' => string '15508' (length=5)
          'key2' => string '20' (length=2)
          'count' => string '2' (length=1)
      public 2 => 
        array (size=3)
          'key1' => string '15510' (length=5)
          'key2' => string '20' (length=2)
          'count' => string '5' (length=1)
....many more similar items

I want to transform this into a very simple array, where the former values from "key1" and "key" are concatenated to be a new key that points to the corressponding "count" value like so:

  myNewArray (size=99999 VERY BIG)
      <key1>_<key2> => <count>
      15504_20 => string '1' (length=1)
      15508_20 => string '2' (length=1)
      15510_20 => string '5' (length=1)

Performance is very important for me since the outer array has several thousand items. Is there a fast method in PHP? The only thing I got was a simple iteration, but this seems to slow for me:

// works but I am looking for a faster version
$myNewArray = array();
foreach ($myExistingArray as $item) {
  $myNewArray [$item["key1"]."_".$item["key1"]]=$item["count"];
}

EDIT / Underlying problem

Some people rightfully added that my current solution is already in O(n) and mentioned that there is no built-in function in PHP to speed this up.

I get "myExistingArray" from a mysql database query. I basically have job objects and want to group them by their status and their event_id. The query similiar to this:

select count(job.id) as count, job.status as key1, job.event_id as key2
from job
group by job.status, job.event_id

I want to rearrange the keys so that later I can easily access the count of jobs for a certain event with a certain status.

6
  • Try using array_column Commented Aug 17, 2016 at 11:15
  • 1
    What is "too slow"? Your suggestion seems to be O(n) which is as fast as it gets for this kind of thing. Commented Aug 17, 2016 at 11:17
  • Where is the data coming from to begin with? maybe you can change how the data is stored, or how it's provided, and avoid having to process the data in this way to begin with. Anyway, if that's impossible, the loop you have is as simple as it gets, which very often (including this case) means it's the fastest way to do things. Iterating all data and creating a new array will always be an O(n) operation, simply because each element needs to be processed Commented Aug 17, 2016 at 11:18
  • Is this a simple database output ? If you give concatenation of the two keys to sql, this will be faster than making it in PHP for a large number of rows. Something like SELECT CONCAT(key1, '_', key2) as key, count [...] then unset the current row in php foreach to gain memory :) Commented Aug 17, 2016 at 11:26
  • Also, what is the final array used for? It you are not using it as a map, you have some other options that could really speed this up. Commented Aug 17, 2016 at 11:29

4 Answers 4

2

Ordinarily, you'd be looking for either the array_walk or maybe the array_map function to transform arrays in PHP, but unfortunately neither of them can alter the key of the array that you want to transform. array_walk will preserve the keys, but won't alter them. So sadly, no, there's no built in function to do what you're asking.

Sign up to request clarification or add additional context in comments.

2 Comments

This should be a comment. Either way, even if there were a built-in function, internally it would have to do the exact same thing as the OP is doing anyway. It'd still be an O(n) operation. And if you add a callback function to the mix, it's almost certainly going to be slower than a simple foreach
A comment probably would have been better. I agree that the speed wouldn't increase with a built in function, but I assumed that the OP meant to ask for a more functional programming oriented approach to this, specifically because of the lack of possible space to improve, which is why I also made reference to functions that would normally be used for array transformations
1

Done a few test with the following results (almost all the same).

Test 1:  [0.25861501693726]
Test 2:  [0.20804476737976]
Test 3:  [0.21039199829102]
Oldskool:[0.26545000076294]
Test 4:  [0.35072898864746]

Doing a var_dump() on the merged array will slow things down (as expected), but if you keep it memory the data is not too bad to work with.

And the PHP used to test:

// Construct the raw data
$i = 0;
do {
    $raw[] = array('key1' => mt_rand(10000,99999), 'key2' => mt_rand(10,99), 'count' => $i);
} while(++$i < 100000);

// Test 1
$before = microtime(true);
foreach($raw as $k => $v) {
    $clean[$v['key1'].'_'.$v['key2']] = $v['count'];
}
$after = microtime(true);
echo 'Test 1:['.($after - $before).']<br />';

$clean = false;
$i = 0;

// Test 2
$before = microtime(true);
$max = count($raw);
do {
    $clean[$raw[$i]['key1'].'_'.$raw[$i]['key2']] = $raw[$i]['count'];
} while(++$i < $max);
$after = microtime(true);
echo 'Test 2:['.($after - $before).']<br />';

$clean = false;
$i = 0;

// Test 3
$before = microtime(true);
$max = count($raw);
for($i; $i < $max; $i++) {
    $clean[$raw[$i]['key1'].'_'.$raw[$i]['key2']] = $raw[$i]['count'];
}
$after = microtime(true);
echo 'Test 3:['.($after - $before).']<br />';

$clean = false;

// Test of Oldskool's suggestion
$before = microtime(true);
foreach (array_keys($raw) as $item) {
    $clean[$raw[$item]['key1'].'_'.$raw[$item]['key2']] = $raw[$item]['count'];
}
$after = microtime(true); 
echo 'Test Oldskool:['.($after - $before).']<br />';

$clean = false;
$i = 0;

// Test 4, just for fun
$before = microtime(true);
$max = count($raw);
do {
    $c = array_pop($raw[$i]);
    $clean[join('_', $raw[$i])] = $c;
} while(++$i < $max);
$after = microtime(true);
echo 'Test 4:['.($after - $before).']<br />';

Edit: Added a test for Oldskool example.

Comments

0

You could change your foreach to only iterate over the keys and not the entire sub-arrays, by changing it to:

foreach (array_keys($myExistingArray) as $item) {
    $myNewArray[$myExistingArray[$item]['key1'] . '_' . $myExistingArray[$item]['key2']] = $myExistingArray[$item]['count'];
}

This will gain you some slight speed advantage (see comparison of the times here (array_keys method) and here (your original method)). On very large arrays, the difference will likely become more noticable.

1 Comment

Not sure if using array_keys will increase overall performance on big arrays. calling array_keys creates a new array, which means allocating more memory and creating new zval's. It's likely that iterating using foreach ($myExistingArray as &$arr) by reference is faster still... Either way, I think it's micro-optimization, for something that is likely to be an XY problem
0

If speed is the issue, and you are not using the final array as a map, I would create a generator, so that you don't have to precalculate everything.

$myExistingArray = [ ... ];
class MyNewArrayIterator implements IteratorAggregate {
    protected $array;
    public function __construct(array $array) {
        $this->array = $array;
    }
    public function getIterator() {
        foreach ($this->array as $value) {
            yield $value['key1'] . '_' . $value['key2'] => $value['count'];
        }
    }
}

And then you can do:

$myNewArray = new MyNewArrayIterator($myExistingArray);
foreach($myNewArray as $key => $value) {
    echo $key . ": " . $value;
}

This may or may not be useful in your use case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.