php array_intersect() efficiency

Question

consider the below script. two arrays with only three values.when i compare these two arrays using array_intersect(). the result is fast.

    <?php
$arrayOne = array('3', '4', '5');
$arrayTwo = array('4', '5', '6');

$intersect = array_intersect($arrayOne, $arrayTwo);

print_r($intersect );

?>

my question is what is the efficiency of the array_intersect(). whether if we compare two arrays both having 1000 values each. would produce better result..... r we need to use some hash function to deal with finding common values quickly which will be effective???.. i need ur suggestion for this...

i am doing an application.if an person comes and login using facebook login.then the application will get his friends list and find whether any friends as commented in my app before and show it to him. roughly a friends may have 200 to300 friends in facebook and db has more than 1000 records. i need to find that efficiently how can i do that.......

@learnfromothers : have you tried the same thing on arrays with 1000+ values? — Sujit Agarwal
– Sujit Agarwal, Commented Jun 13, 2011 at 10:33
why not find out yourself? do a benchmark. in general, it doesnt matter if its efficient or not unless you profiled your application and found that the calls to array_intersect are slowing down your application significantly. How much is significant is up to your requirements. — Gordon
– Gordon, Commented Jun 13, 2011 at 10:37
@Coding Freak no i have not tried that . i am doing an application.if an person comes and login using facebook login.then the application will get his friends list and find whether any friends as commented in my app before and show it to him. roughly a friends may have 200 to300 friends in facebook and db has more than 1000 records. i need to find that efficiently how can i do that....... — learnfromothers
– learnfromothers, Commented Jun 13, 2011 at 10:37
@Gordan for this any solution..... i am doing an application.if an person comes and login using facebook login.then the application will get his friends list and find whether any friends as commented in my app before and show it to him. roughly a friends may have 200 to300 friends in facebook and db has more than 1000 records. i need to find that efficiently how can i do that....... — learnfromothers
– learnfromothers, Commented Jun 13, 2011 at 10:38
But how can you be sure that you would come accross people who have only 200 to 300 friends? I have seen profiles on facebook that have more than 5K+ friends. — Sujit Agarwal
– Sujit Agarwal, Commented Jun 13, 2011 at 10:45

phihag · Accepted Answer · 2013-10-13 10:20:39Z

24

Intersection can be implemented by constructing a set of the searched values in the second array, and looking up in a set can be made so fast that it takes essentially constant time on average. Therefore, the runtime of the whole algorithm can be in O(n).

Alternatively, one can sort the second array (in O(n log n)). Since looking up in a sorted array has a runtime in O(log n), the whole algorithm should then have a runtime in O(n log n).

According to a (short, unscientific) test I just ran, this seems to be the case for php's array_intersect:

Performance of array_intersect

Here's the code I used to test it. As you can see, for an input size as small as 1000, you don't need to worry.

edited Oct 13, 2013 at 10:20

answered Jun 13, 2011 at 10:44

phihag

289k75 gold badges475 silver badges489 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

learnfromothers Over a year ago

@phinag ya thats right.... for this any solution..... i am doing an application.if an person comes and login using facebook login.then the application will get his friends list and find whether any friends as commented in my app before and show it to him. roughly a friends may have 200 to300 friends in facebook and db has more than 1000 records. i need to find that efficiently how can i do that.

phihag Over a year ago

@learnfromothers array_intersect will not be the performance problem for less than a thousand elements.

learnfromothers Over a year ago

thanks. for your effect. very much statisfied with your answer.... thanks again.........

vicch Over a year ago

@phihag If you could remember, what was the tool you used to get that graph?

phihag Over a year ago

@vicch I apologize, I don't remember, and I can't find that code to generate the graph (the code to get the data, on the other hand, is linked to in the answer). This was probably gnuplot or matplotlib. But I cannot exclude LibreOffice or even a hand-coded script.

|

Gumbo · Accepted Answer · 2011-06-13 10:56:07Z

18

array_intersect sorts the arrays before comparing their values in parallel (see the use of zend_qsort in the source file array.c). This alone takes O(n·log n) for each array. Then the actual intersection does only take linear time.

Depending on the values in your arrays, you could implement this intersection in linear time without the sorting, for example:

$index = array_flip($arrayOne);
foreach ($arrayTwo as $value) {
    if (isset($index[$value])) unset($index[$value]);
}
foreach ($index as $value => $key) {
    unset($arrayOne[$key]);
}
var_dump($arrayOne);

answered Jun 13, 2011 at 10:56

Gumbo

657k112 gold badges792 silver badges852 bronze badges

Comments

slaszu · Accepted Answer · 2018-11-08 09:21:03Z

5

The fastest solution I found:

function arrayIntersect($arrayOne, $arrayTwo) {
        $index = array_flip($arrayOne);
        $second = array_flip($arrayTwo);

        $x = array_intersect_key($index, $second);

        return array_flip($x);
}

Tests I have made looks like below:

function intersect($arrayOne, $arrayTwo)
{
    $index = array_flip($arrayOne);
    foreach ($arrayTwo as $value) {
        if (isset($index[$value])) unset($index[$value]);
    }
    foreach ($index as $value => $key) {
        unset($arrayOne[$key]);
    }

    return $arrayOne;
}

function intersect2($arrayOne, $arrayTwo)
{
    $index = array_flip($arrayOne);
    $second = array_flip($arrayTwo);

    $x = array_intersect_key($index, $second);

    return array_flip($x);

}

for($i =0; $i < 1000000; $i++) {
    $one[] = rand(0,1000000);
    $two[] = rand(0,100000);
    $two[] = rand(0,10000);
}

$one = array_unique($one);
$two = array_unique($two);

$time_start = microtime(true);
$res = intersect($one, $two);
$time = microtime(true) - $time_start;

echo "Sort time $time seconds 'intersect' \n";


$time_start = microtime(true);
$res2 = array_intersect($one, $two);
$time = microtime(true) - $time_start;

echo "Sort time $time seconds 'array_intersect' \n";


$time_start = microtime(true);
$res3 = intersect2($one, $two);
$time = microtime(true) - $time_start;

echo "Sort time $time seconds 'intersect2' \n";

Results from php 5.6 :

Sort time 0.77021193504333 seconds 'intersect' 
Sort time 6.9765028953552 seconds 'array_intersect' 
Sort time 0.4631941318512 seconds 'intersect2'

edited Nov 8, 2018 at 9:21

answered Nov 8, 2018 at 7:40

slaszu

511 silver badge3 bronze badges

1 Comment

sɐunıɔןɐqɐp Over a year ago

From Review: Hi, please don't answer just with source code. Try to provide a nice description about how your solution works. See: How do I write a good answer?. Thanks

inquam · Accepted Answer · 2011-06-13 10:46:00Z

2

From what you state above, I would recommend you to implement a caching mechanism. That way you would of load the DB and speed up your application. I would also recommend you to profile the speed of array_intersect with increasing amount of data to see how performance scale. You could do this by simply wrapping the call in calls for the system time and calculate the difference. But I would recommend you to use a real profiler to get good data.

answered Jun 13, 2011 at 10:46

inquam

12.9k17 gold badges66 silver badges106 bronze badges

6 Comments

inquam Over a year ago

@learnfromothers: A profiler is an application that helps you meassure the performance of your application. Look at Xdebug for example: xdebug.org/docs/profiler or if you use Zend Studio they have a built in one: files.zend.com/help/Beta/Zend_Studio_8_0/…

learnfromothers Over a year ago

@inquam ya thanks. can i use some hash function to store the values in db so that the searching can be limited a bit. is that a right approach???

inquam Over a year ago

@learnfromothers: The best way to access your data is determined by a multitude of factors. The layout of the data, how frequent read and writes are etc. But with a profiler in place you can quickly see how different approaches and specific changes affect your performance.

learnfromothers Over a year ago

@inquam thanks again.... but do u know any best methods used for efficient storage and retrieval of data in current days and in major website like facebook etc....

inquam Over a year ago

@laernfromothers: I know that Facebook for example relies heavily on caching of data. I think they use a modified version of memcache. This stores data in memory and is very fast, compared to getting the data from DB all the time. Also, I think that they pre-calculate key bits of information so that you don't have to do that each and every time. Say your friends list. You could compile that once and store it in cache memory. Then you could retrieve it from there each time you need it. Only once you add or remove a friend would it have to be rebuilt.

|

Rory McCrossan · Accepted Answer · 2014-10-30 09:18:41Z

I implementing this simple code of comparing array_intersect and array_intersect_key,

$array = array();
    for( $i=0; $i<130000; $i++)
        $array[$i] = $i;
    for( $i=200000; $i<230000; $i++)
        $array[$i] = $i;
    for( $i=300000; $i<340000; $i++)
        $array[$i] = $i;

    $array2 = array();
    for( $i=100000; $i<110000; $i++)
        $array2[$i] = $i;
    for( $i=90000; $i<100000; $i++)
        $array2[$i] = $i;
    for( $i=110000; $i<290000; $i++)
        $array2[$i] = $i;

    echo 'Intersect to arrays -> array1[' . count($array) . '] : array2[' . count($array2) . '] ' . '<br>';
    echo date('Y-m-d H:i:s') . '<br>';
    $time = time();
    $array_r2 = array_intersect_key($array,$array2);
    echo 'Intercept key: ' . (time()-$time) . ' segs<br>';
    $time = time();
    $array_r = array_intersect($array,$array2);
    echo 'Intercept: ' . (time()-$time) . ' segs<br>';

the result....

Intersect to arrays -> array1[200000] : array2[200000] 
2014-10-30 08:52:52
Intercept key: 0 segs
Intercept: 4 segs

In this comparing of the efficency between array_intersect and array_intersect_key, we can see the interception with keys it is much faster.

Collectives™ on Stack Overflow

php array_intersect() efficiency

5 Answers 5

6 Comments

Comments

1 Comment

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

Comments

1 Comment

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related