1

I've got an array like this:

$a = array(
    array(2 => 1, 4 => 2, 9 => 3),
    array(3 => 7, 4 => 5, 7 => 3),
    array(1 => 6, 4 => 5),
    ...
);

So the array contains a huge amount of sub arrays with integer key => integer value. Now I want to find subarrays which share no keys or if they share a key the value of this key must be the same.

Example: $a[1] and $a[2] would match because $a[1][4] == $a[2][4] and no other keys match. But $a[0] and $a[1] would not match because $a[0][4] != $a[1][4].

The number of elements in the subarrays may vary.

Is there an efficient way to do this ? The only way I can think of is check each possible pair in a nested loop resulting in O(n^2).

If someone has an idea for a more meaningful title feel free to edit it.

Maybe code makes it more clear: (naive implementation)

$pairs = array();
for($i = 0; $i < count($a); $i++)
    for($j = $i+1; $j < count($a); $j++)
        if(array_intersect_key($a[$i], $a[$j]) == array_intersect_assoc($a[$i], $a[$j]))
            $pairs[] = array($i, $j);

Alternative:

$matching = array();
for($i = 0; $i < count($a); $i++)
    for($j = $i+1; $j < count($a); $j++)
        if(array_intersect_key($a[$i], $a[$j]) == array_intersect_assoc($a[$i], $a[$j]))
            list($matching[$i][], $matching[$j][]) = array($j, $i);
8
  • What do you expect the output to be here exactly? Commented Sep 4, 2013 at 8:15
  • could be an array with pairs like array(array(1,2)) or an array like array(1=>array(2), 2=>array(1)). I don't really care. Commented Sep 4, 2013 at 8:21
  • Result of key 0 & 1 would be empty (they share same key, but diff value); 0 & 2 => empty (they share same key, but diff value); 1 & 2 => array(4=>5) (they share same key with same value. Is this correct? Commented Sep 4, 2013 at 8:37
  • I can't think on a way that won't include a nested loop (one way or another). Commented Sep 4, 2013 at 8:40
  • @Glavić I'm interessed in the indice of the arrays that match not in an resulting array. Commented Sep 4, 2013 at 8:52

2 Answers 2

1

There might be ways to do it, but it somewhat depends on if you know how many matches are likely (or the general 'matchyness' of your data). If there's more matches than not it might be better to start with assuming everything matches and eliminating.

In any case, I think you can pre-process the data. I'm not sure if this is faster -- it really depends on the distribution of your data, but I'd start by trying something like this and work from there:

$a = array(
    array(2 => 1, 4 => 2, 9 => 3),
    array(3 => 7, 4 => 5, 7 => 3),
    array(1 => 6, 4 => 5),
    array(1 => 6, 4 => 5, 7 => 5),
    array(2 => 1, 4 => 2, 9 => 3)   
);
// 1 and 2 match, 2 and 3 match, 0 and 4 match

$keyData = array();
for ($i = 0; $i < count($a); $i++) {
    foreach($a[$i] as $k => $v) {
        if (!isset($keyData[$k])) { 
            $keyData[$k] = array();
        }
        if (!isset($keyData[$k][$v])) {
            $keyData[$k][$v] = array();   
        }
        $keyData[$k][$v][] = $i;
    }
}

$potentialMatches = array();
foreach ($keyData as $key => $values) {
    // Ignore single key/value pairs
    if (count($values) > 1) {
        foreach ($values as $value => $arrayIndices) {
            for ($i = 0; $i < count($arrayIndices); $i ++) {
                for ($j = $i + 1; $j < count($arrayIndices); $j ++) { 
                    $potentialMatches[] = array($arrayIndices[$i], $arrayIndices[$j]);
                }
            }
        }
    }
}

// You might need to do this ...
/*
foreach ($potentialMatches as &$m) {
    array_unique($m);
}
*/

$pairs = array();
foreach ($potentialMatches as $m) { 
     if(array_intersect_key($a[$m[0]], $a[$m[1]]) 
        == array_intersect_assoc($a[$m[0]], $a[$m[1]])) {
         $pairs[] = $m;
    }
}

print_r($pairs);

Output:

Array
(
    [0] => Array
        (
            [0] => 0
            [1] => 4
        )

    [1] => Array
        (
            [0] => 1
            [1] => 2
        )

    [2] => Array
        (
            [0] => 2
            [1] => 3
        )

)

EDIT

As I said in my comment, that doesn't catch arrays that don't share any keys -- which you consider a match. The code below does this, although I'm not sure if it's faster than the nested solution (and it's going to use a ton of memory)

// New test data to cover the case I missed
$a = array(
    array(2 => 1, 4 => 2, 9 => 3),
    array(3 => 7, 4 => 5, 7 => 3),
    array(1 => 6, 4 => 5),
    array(1 => 6, 4 => 5, 7 => 5),
    array(2 => 1, 4 => 2, 9 => 3), 
    array(8 => 3)
);
// 1 and 2 match, 2 and 3 match, 0 and 4 match, 5 matches all

// First assume everything is a match, build an array of:
// indicies => array of potential matches
$potentialMatches = array_fill(0, count($a), array_keys($a));

// Build data about each key, the indicies that contain that key
// and the indicies for each value of that key
$keyData = array();
for ($i = 0; $i < count($a); $i++) {
    foreach($a[$i] as $k => $v) {
        if (!isset($keyData[$k])) { 
            $keyData[$k] = array();
        }
        if (!isset($keyData[$k][$v])) {
            $keyData[$k][$v] = array();   
        }
        $keyData[$k]['all'][] = $i;
        $keyData[$k][$v][] = $i;
    }
}

// print_r($keyData);

// Now go through the key data and eliminate indicies that 
// can't match
foreach ($keyData as $key => $values) {

    if (count($values) > 2) {  // Ignore single key/value pairs

        // Two indecies do not match if they appear in seperate value lists
        // First get the list of all indicies that have this key
        $all = array_unique($values['all']);
        unset($values['all']);

        // Now go through the value lists
        foreach ($values as $value => $arrayIndices) {

            // The indicies for this value cannot match the other 
            // indices in the system, i.e. this list
            $cantMatch = array_diff($all, $arrayIndices);

            // So remove the indicies that can't match from the potentials list            
            foreach ($arrayIndices as $index) {            
                $potentialMatches[$index] = array_diff($potentialMatches[$index], $cantMatch);    
            }
        }
    }
}

//print_r($potentialMatches);

// You said you didn't mind the output format, so that's probably enough 
// but that array contains (x,x) which is pointless and both (x,y) and (y,x)
// so we can do one final bit of processing to print it out in a nicer way

$pairs = array();
foreach ($potentialMatches as $x => $matches) { 
    foreach ($matches as $y) { 
        if ( ($x < $y) ) { 
            $pairs[] = array($x, $y);
        }
    }
}

print_r($pairs);

Output

Array
(
    [0] => Array
        (
            [0] => 0
            [1] => 4
        )

    [1] => Array
        (
            [0] => 0
            [1] => 5
        )

    [2] => Array
        (
            [0] => 1
            [1] => 2
        )

    [3] => Array
        (
            [0] => 1
            [1] => 5
        )

    [4] => Array
        (
            [0] => 2
            [1] => 3
        )

    [5] => Array
        (
            [0] => 2
            [1] => 5
        )

    [6] => Array
        (
            [0] => 3
            [1] => 5
        )

    [7] => Array
        (
            [0] => 4
            [1] => 5
        )

)
Sign up to request clarification or add additional context in comments.

3 Comments

This looks really promissing I'll test it and report my findings asap.
I've just noticed that this doesn't catch the case were arrays share no keys. My first idea was to pre-populate potentialMatches and then remove pairs that didn't match. I'll have a think about it and if I have time try and post something else. In any case you can have a look at keyData after it's generated, the structure there looks like it should be possible to process it quickly to get the data out.
Fixed it, the second solution in my answer should now cover all the cases. However I'm not sure it'll run faster, depends how many different keys you have in total. If you only have a few matches it may be better to do things in a different way.
0

if you are looking for adjacent matching,

$temp = null;
$last_result = array();
foreach($a as $key => $value){
    if(is_null($temp)){
        $temp = $value;
    } else{
        $result = array_intersect_assoc($temp, $value);
        if(!empty($result))
            array_push($last_result, $result);
        $temp = $value;
    }
}
print_r($last_result);

otherwise just use array_intersect_assoc

for a example you can do like this

$res = array_intersect_assoc($a[0],$a[1],$a[2]);
print_r($res);

1 Comment

if I would want this result I'd simply use call_user_func_array('array_intersect_assoc', $a); See my slow sample code to make it more clear.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.