4

Please, explain me how it works. Why passing value to array from variable instead of literal increasing memory consumption in 10x times?

PHP 7.1.17

First example:

<?php
ini_set('memory_limit', '1G');
$array = [];
$row = 0;
while ($row < 2000000) {
    $array[] = [1];

    if ($row % 100000 === 0) {
        echo (memory_get_usage(true) / 1000000) . PHP_EOL;
    }
    $row++;
}

Total memory usage ~70MB

Second example:

<?php
ini_set('memory_limit', '1G');
$array = [];
$a = 1;

$row = 0;
while ($row < 2000000) {
    $array[] = [$a];

    if ($row % 100000 === 0) {
        echo (memory_get_usage(true) / 1000000) . PHP_EOL;
    }
    $row++;
}

Total memory usage ~785MB

Also there is no difference in memory consumption if resulting array is one-dimensional.

8
  • 3v4l.org/TBmOj Commented Jun 21, 2018 at 12:39
  • @Bogdan your link shows Parse error. Is it the expected output? Commented Jun 21, 2018 at 12:41
  • This seems to be a bug with PHP >= 7.1 but you can mitigate it by using $array[] = [&$a]; instead of just $array[] = [$a];. This will force a reference to $a instead of creating a copy of its value. I would imagine that using a native 1 causes the compiler to reference it as a primitive type rather than creating a copy of it. Commented Jun 21, 2018 at 13:10
  • php.net/manual/en/internals2.variables.intro.php When you create a new variable, a whole new struct is created. Methinks instead increasing the number of links to the $a, the engine is creating a new copy of the $a contents, and uses it as keys. Commented Jun 21, 2018 at 13:12
  • @MonkeyZeus That is not a bug, that's by design. PHP will store primitive types directly in the array unless explicitly told to do otherwise with the & operator. Objects are stored by reference (unless you explicitly state otherwise by using clone()). The reason is that a primitive can be a literal or a variable or a function return value and you can't reference a literal or a return value so copies are always stored. Objects on the other hand cost a lot more memory than a reference to an object does. Calling that a bug is disingenuous. Commented Jun 22, 2018 at 11:08

1 Answer 1

3

The key thing here is that [1], although it's a complex value, is a constant - the compiler can trivially know that it's the same every time it's used.

Since PHP uses a "copy on write" system when multiple variables have the same value, the compiler can actually construct the "zval" structure for the array before the code is run, and just increment its reference counter each time a new variable or array value points to it. (If any of them are modified later, they will be "separated" into a new zval before modification, so at that point an extra copy will be made anyway.)

So (using 42 to stand out more), this:

$bar = [];
$bar[] = [42];

Compiles to this (VLD output generated with https://3v4l.org):

compiled vars:  !0 = $bar
line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   3     0  E >   ASSIGN                                                   !0, <array>
   4     1        ASSIGN_DIM                                               !0
         2        OP_DATA                                                  <array>
         3      > RETURN                                                   1

Note that the 42 doesn't even show up in the VLD output, it's implicit in the second <array>. So the only memory usage is for the outer array to store a long list of pointers, which all happen to point to the same zval.

When using a variable like [$a], on the other hand, there is no guarantee that the values will all be the same. It's possible to analyse the code and deduce that they will be, so OpCache might apply some optimisations, but on its own:

$a = 42;
$foo = [];
$foo[] = [$a];

Compiles to:

compiled vars:  !0 = $a, !1 = $foo
line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   3     0  E >   ASSIGN                                                   !0, 42
   4     1        ASSIGN                                                   !1, <array>
   5     2        INIT_ARRAY                                       ~5      !0
         3        ASSIGN_DIM                                               !1
         4        OP_DATA                                                  ~5
         5      > RETURN                                                   1

Note the extra INIT_ARRAY opcode - that's a new zval being created with the value of [$a]. This is where all your extra memory goes - every iteration will create a new array that happens to have the same contents.


It's relevant to point out here that if $a was itself a complex value - an array or object - it would not be copied on each iteration, as it would have its own reference counter. You'd still be creating a new array each time around the loop, but those arrays would all contain a copy-on-write pointer to $a, not a copy of it. This doesn't happen for integers (in PHP 7) because its actually cheaper to store the integer directly than to store a pointer to somewhere else that stores the integer.

One more variation worth looking at, because it may be an optimisation you can make by hand:

$a = 42;
$b = [$a];
$foo = [];
$foo[] = $a;

VLD output:

compiled vars:  !0 = $a, !1 = $b, !2 = $foo
line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   3     0  E >   ASSIGN                                                   !0, 42
   4     1        INIT_ARRAY                                       ~4      !0
         2        ASSIGN                                                   !1, ~4
   5     3        ASSIGN                                                   !2, <array>
   6     4        ASSIGN_DIM                                               !2
         5        OP_DATA                                                  !0
   7     6      > RETURN                                                   1

Here, we have an INIT_ARRAY opcode when we create $b, but not when we add it to $foo. The ASSIGN_DIM will see that it's safe to reuse the $b zval each time, and increment its reference counter. I haven't tested, but I believe this will take you back to the same memory usage as the constant [1] case.


A final way to verify that copy-on-write is in use here is to use debug_zval_dump, which shows the reference count of a value. The exact numbers are always a bit off, because passing the variable to the function itself creates one or more references, but you can get a good idea from the relative values:

Constant array:

$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = [42];
}
debug_zval_dump($foo[0]);

Shows refcount of 102, as value is shared across 100 copies.

Identical but not constant array:

$a = 42;
$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = [$a];
}
debug_zval_dump($foo[0]);

Shows refcount of 2, as each value has its own zval.

Array constructed once and reused explicitly:

$a = 42;
$b = [$a];
$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = $b;
}
debug_zval_dump($foo[0]);

Shows refcount of 102, as value is shared across 100 copies.

Complex value inside (also try $a = new stdClass etc):

$a = [1,2,3,4,5];
$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = [$a];
}
debug_zval_dump($foo[0]);

Shows refcount of 2, but the inner array has a refcount of 102: there's a separate array for every outer item, but they all contain pointers to the zval created as $a.

Sign up to request clarification or add additional context in comments.

5 Comments

Is this accurate? I thought Copy On Write only applied to the array as a whole when it's passed into/out of functions, not values inside the array.
@GordonM Copy-on-write is not specific to functions, or to "ordinary" variables, it's something that happens at the zval level - the value itself tracks how often it's used, and whether it's safe to modify in place or delete. So $a = []; $foo['bar'] = $a; will result in a zval with two pointers to it, one called $a, and one in the 'bar' slot of array $foo. You can get a rough view with debug_zval_dump, although the actual counts are off due to the extra references created when you pass the variable to the function.
@GordonM I've added a section to the answer showing how you can see the copy-on-write in action.
Hi, I know that comments like the one I am about to make are generally unwelcomed but you seem to know A LOT about PHP's internals. Would you care to take a look at stackoverflow.com/q/50951495/2191572? If not then I totally understand. I was able to answer the "what" but not the "why".
@MonkeyZeus Ah, go on then! ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.