3

I need to generate a large list of random numbers from 600k to 2000k, but the list can not have duplicates.

My current 'implementation' looks like this:

<?php
    header('Content-type: text/plain');
    $startTime = microtime(true);
    $used = array();
    for ($i=0; $i < 600000; ) { 
        $random = mt_rand();
        //if (!in_array($random, $used)) {
        $used[] = $random;
        $i++;
        //}
    }
    $endTime = microtime(true);
    $runningTime = $endTime - $startTime;
    echo 'Running Time: ' . $runningTime;
    //print_r($used);
?>

If I keep the in_array test commented the processing time is around 1 second, so the mt_rand calls and the used array filling are relatively 'cheap' but when I uncomment the in_array test bad things happens! (I'm just waiting -it's been more then 10 minutes- for the script to terminate...)

So I'm looking for alternatives either on the duplicate detection side or in the generation part (How could i generate random numbers without the risk of getting duplicates)

I'm open to any suggestion.

4 Answers 4

16

For a quick/dirty solution, does using/checking array keys improve your speed at all?

$used = array();
for ($i = 0; $i < 600000; ) { 
    $random = mt_rand();
    if (!isset($used[$random])) {
        $used[$random] = $random;
        $i++;
    }
}
$used = array_values($used);
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! The difference in running time is very big! Even when running the loop 2000k times. It's lightning fast!
+1. This way is good, as in_array not only runs slow, but also eats memory. after reducing if of ~20 cases and removing from all of them in_array i saved 1.2mb of memory on critical resource that serves to 25 millions of users.
5

in_array requires to search the whole array in the worst case, that means linear costs (O(n)). But using the array key as – well – the key, the costs are constant (O(1)) since the costs for array access is always constant.

Comments

2

You could for example do something like this instead

$random = mt_rand();

$array = range($random, $random + 600000);

$array = shuffle($array);

That would create a array that first is in order, but then it shuffles the array, so the values will be random. No collisions! :D

Comments

1

If you do the looping anyways and if you don't need more than 600000 why would you check them at all, why not just append $i to $random. done. not random enough?

for ($i = 0; $i < 600000; $i++)
{
    $yourArray[] = mt_rand() . $i; 
}

Furthermore there is the array function array_unique, which removes duplicate values from an array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.