5

Test script

$i = 0;
array_uintersect(['foo', 'bar'], ['baz', 'qux'], function($a, $b) use (&$i) {
    print_r([$a, $b, $i++]);
});

Actual Result

Array
(
    [0] => bar
    [1] => foo
    [2] => 0
)
Array
(
    [0] => qux
    [1] => baz
    [2] => 1
)
Array
(
    [0] => bar
    [1] => qux
    [2] => 2
)
Array
(
    [0] => bar
    [1] => foo
    [2] => 3
)

Expected Result

Array
(
    [0] => foo
    [1] => baz
    [2] => 0
)
Array
(
    [0] => bar
    [1] => qux
    [2] => 1
)

In other words, what I am expecting to be passed to the callback is the current element of the left array, and the current element of the right array.

Furthermore, I would expect the same logic to apply if I were to pass an additional array to array_uintersect - one more argument being passed to the callback ($c, for example).

Can someone explain this behaviour?

8
  • I don't understand your use of $i here. From the docs: "The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second." Commented Oct 27, 2016 at 15:52
  • @mistermartin I am using it for debugging purposes; a way to keep track on how much times does the iteration happen. Commented Oct 27, 2016 at 15:53
  • Why don't you just loop through the first array and use the same index to get the value from the second array? Commented Oct 27, 2016 at 15:54
  • @SanderVisser Sure, but I want to use this function so any developer that picks up on my work in the future immediately knows what's going on. Intersection is self-explanatory. Commented Oct 27, 2016 at 15:55
  • Yes but the array_uintersect tries to intersect the values not the key php.net/manual/en/function.array-intersect-key.php Commented Oct 27, 2016 at 16:07

5 Answers 5

7

What's not mentioned in the array_uintersect docs is that, internally, PHP sorts all the arrays first, left to right. Only after the arrays are sorted does PHP walk them (again, left to right) to find the intersection.

The third argument (the comparison function) is passed to the internal sort algorithm, not the intersecting algorithm. Thus, the debugging output seen is the sorting algorithm figuring out the ordering.

The zend_sort implementation generally uses a bisecting quick sort implementation. For arrays of the size in your example, PHP uses insertion sort. For large arrays, PHP uses a 3 or 5 point pivot so as to improve worst-case complexity.

Since you're not explicitly returning any value from the comparison function, PHP defaults to returning null (0), and since PHP is using insertion sort, you're seeing O(n*n) behavior as the sort walks all the combinations.

Sign up to request clarification or add additional context in comments.

1 Comment

Great answer, it's the one I was waiting for. Other people were trying to be helpful and score some reputation by providing workarounds, but my question was why is it behaving that way. I went through the C code you've linked, and to me, it seems that, internally, array_uintersect (array_intersect in general) is analogous with manually iterating over the array with foreach and checking if needle element exists in haystack array with in_array(), and the main difference being that the C code sorts it before iterating, thus increasing the lookup speed.
4

I have no idea why do you expect anything from the comparison callback, except comparing the values of the arrays. The sole purpose of the callback is to compare the next pair of items from both arrays.

The function returns the result of intersection of the two arrays. In the callback you express your idea of how the values are supposed to be compared. For example, the following code assumes that the intersection should be performed by comparing the first characters of the strings:

$a = array_uintersect(['foo', 'bar'], ['baz', 'qux'], function($a, $b) {
  return strcmp($a[0], $b[0]);
});

print_r($a);

Output

Array
(
    [1] => bar
)

The order of the items passed to the callback is specified by the PHP internals, and may easily change in future.

So the comparison function is not supposed to do anything, except comparing two variables. There is not even a hint of use of the callback for any other purpose in the official documentation.

Comments

2

I believe the first two calls are being used to seed variables in the internal algorithm. But since you don't return anything that the algorithm can use to determine equality/sorting, it only runs the next two.

If you actually return 0, 1 or -1 then you see the full comparison chain that is needed to calculate the intersection:

$i = 0;
array_uintersect(['foo', 'bar'], ['baz', 'qux'], function($a, $b) use (&$i) {
    print_r([$a, $b, $i++]);

    if ($a === $b) return 0;
    if ($a  >  $b) return 1;
    return -1;
});

Yields:

Array
(
    [0] => bar
    [1] => foo
    [2] => 0
)
Array
(
    [0] => qux
    [1] => baz
    [2] => 1
)
Array
(
    [0] => bar
    [1] => baz
    [2] => 2
)
Array
(
    [0] => foo
    [1] => baz
    [2] => 3
)
Array
(
    [0] => foo
    [1] => baz
    [2] => 4
)
Array
(
    [0] => foo
    [1] => qux
    [2] => 5
)

Comments

0

I think you are looking for this ;)

$result = array_map(function($a, $b) {
    return [$a, $b];
}, ['foo', 'bar'], ['baz', 'qux']);
var_dump($result);

This will output

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(3) "foo"
    [1]=>
    string(3) "baz"
  }
  [1]=>
  array(2) {
    [0]=>
    string(3) "bar"
    [1]=>
    string(3) "qux"
  }
}

Update: It returns the result you want with the array_uintersect method. It isn't the most efficient way to do this and didn't test it with different data sets etc but should work.

$entities = [
    [
        'id' => 1,
        'timestamp' => 1234
    ],
    [
        'id' => 2,
        'timestamp' => 12345
    ],
    [
        'id' => 3,
        'timestamp' => 123456
    ],
    [
        'id' => 8,
        'timestamp' => 123456
    ],
    [
        'id' => 10,
        'timestamp' => 123456
    ],
    [
        'id' => 11,
        'timestamp' => 123456
    ],
    [
        'id' => 12,
        'timestamp' => 123456
    ]
];

$identities = [1, 11, 2, 8, 10];

$result = array_uintersect($entities, $identities, function($a, $b) {

    // Both array skip
    if (is_array($a) && is_array($b)) {
        if ($a['id'] > $b['id']) {
            return 1;
        }
        return -1;
    }

    // Both int skip
    if (is_int($a) && is_int($b)) {
        if ($a > $b) {
            return 1;
        }
        return -1;
    }

    // $a is array
    if (is_array($a)) {
        if ($a['id'] == $b) {
            return 0;
        }
        elseif ($a['id'] > $b) {
            return 1;
        }
        return -1;
    }

    // $b is array
    if($b['id'] == $a) {
        return 0;
    }
    if($a > $b['id']) {
        return 1;
    }

    return -1;
});
var_dump($result);

and the result

array(5) {
  [0]=>
  array(2) {
    ["id"]=>
    int(1)
    ["timestamp"]=>
    int(1234)
  }
  [1]=>
  array(2) {
    ["id"]=>
    int(2)
    ["timestamp"]=>
    int(12345)
  }
  [3]=>
  array(2) {
    ["id"]=>
    int(8)
    ["timestamp"]=>
    int(123456)
  }
  [4]=>
  array(2) {
    ["id"]=>
    int(10)
    ["timestamp"]=>
    int(123456)
  }
  [5]=>
  array(2) {
    ["id"]=>
    int(11)
    ["timestamp"]=>
    int(123456)
  }
}

9 Comments

That is certainly what I was trying to achieve, but I wanted to use array_uintersect because of the intersect word in the function name (want to make it as clear as possible what I'm doing).
From the docs array_uintersect — Computes the intersection of arrays, compares data by a callback function So I think array_uintersect is less clear because of the documentation your not comparing "data". array_intersect_ukey does what you want but doesn't return the result that you want. it doesn't combine the values.
I do want to compare values, not keys. Specifically, I have a multi dimensional array with two subelements (id and timestamp) on every element, and a single dimensional array (only id), and my goal is to discard all the entries from the multi dimensional array that doesn't contain elements from the single dimensional array.
@NinoŠkopac So you want to compare values, or just rearrange arrays in manner of this answer? Why you trying to perform intersect, when your arrays intersection are empty? You're printing some hidden details of algorithm implementation, not the function result
Ok I think i understand now, so you have an array with objects that have an id and some other properties and you have a array with ids and you want to filter all objects that aren't defined in the array that contains the ids
|
-4
<?php
    $i  = 0;
    $r1 = ['foo', 'bar'];
    $r2 = ['baz', 'qux'];
    $result = array_uintersect($r1, $r2, function($a, $b){
        return ($a[0]> $b[0]);
    });


    var_dump($result);
    // YIELDS::
    array (size=2)
      0 => string 'foo' (length=3)
      1 => string 'bar' (length=3)

1 Comment

Could use some explaining.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.