2

This a new version of this post in order to isolate the programming question from the probability question.

I want to store a number of e.g. 25 randomly generated numbers between 1 and 365 in an array. But i need to keep track of duplicates. Here's how i was thinking of doing it:

  • create 4 arrays: a master array, an array for 2 duplicates, an array for 3 duplicates and one for more than 3 duplicates

  • add each generated number one by one into the master array. But before doing so, loop through the array to see if it's in there already. If so, add it to the second array, but before doing so repeat the above process and so on

at the end of the process i could count the non-null values in each array to know how many unique numbers i have, how many came up twice etc

it doesn't seem to be a very efficient algorithm. Any suggestions to improve it?

Can my suggested approach be considered to be Big O(n) i.e. linear?

5
  • Can you elaborate on "keep track of duplicates"? What information do you want to extract from this data? Commented Feb 15, 2011 at 9:51
  • Eg. are you estimating the probability that two random people share a birthday? Commented Feb 15, 2011 at 9:53
  • @Blorgbeard, i have reworded the post. i will generate 25 random numbers bewtween 1 and 365. I need to keep track of any number being generated twice or three times ore more. Of course i won't always get duplicates, and even less likely triplicates but i might...so i want to capture / count such occurrences Commented Feb 15, 2011 at 10:10
  • Do you need to preserve the order the numbers had been generated in? Commented Feb 15, 2011 at 10:34
  • Also I believe your algorithm is technically O(1), but very inefficient. Since there are at most 365/6 distinct birthdays, each of your arrays will have a maximum size of 365 elements. Traversing an array of constant size is O(1) because it doesn't depend on the number of birthdays generated. That doesn't mean it's fast though. If you know you'll always generate < 365 random birthdays and want to analyse the algorithm in those terms, I think the complexity is O(n^2) - each new birthday is O(n), and you need to make 'n' of them. Commented Feb 15, 2011 at 11:54

4 Answers 4

2

Why are you using arrays? A HashMap or other map structure would seem to make more sense. Here's how I would do it.

  1. Instantiate a new, empty hashmap from birthdays to integers
  2. Generate a random birthday.
  3. Check if the birthday is in the hashmap. If it isn't, add it with a value of '1'. It it is, increment the value at that birthday.

Now you can get the number of unique dates generated by the number of keys in the hashmap, and any information about the number of duplicates from the values.

Sign up to request clarification or add additional context in comments.

5 Comments

thanks for the suggestion. I will need to learn about Hash Tables. found a good starting point on SO: stackoverflow.com/questions/2489169/…
@All, i will accept the answer that suggests the most computationally efficient approach. Will need to take some time to evaluate answers. Thanks for the participation so far!
I'd use a HashMap<Integer,Integer> myself, but you can do it with a Hashtable if you prefer. And I wouldn't pre-initialise anything, just check every time a new number is generated if it's in the map already. If it is, increment its count, if there isn't, insert it with count 1.
@biziclop - edited to suggest hashmap instead of hashtable; my Java's a bit rusty. And by initialize I just meant to instantiate the object, which I thought was clear from the "empty".
@Baba - it seems that HashTable is deprecated, you should study HashMap instead (the accepted answer from your link covers this). Basically, you need to know that it's a map from keys (birthdays here) to values (here, the number of occurrences). You can look up the value by a given key, enumerate over all keys/values, and do a few other useful operations.
1

Here is how I think the problem can be solved.

  1. Have two arrays: a master one containing the birthdays and one containing the number of times it has repeated.
  2. Generate a random birthday.
  3. Loop through the main array and see if it is already there.
  4. If it is not there, add it to the main array and in the same index position, store 1 in the other array. ( If you add the birthday to master[3], store 1 in number[3])
  5. If it is already there, add one to the corresponding index in the other array.
  6. Now your master array will contain the generated birthdays and the corresponding indexes in the other array will have the number of times it is repeated. (master[0] has a date and number[0] has the number of times the date is repeated).

Hope this helps.

6 Comments

I like it - simple and efficient. Conducive to clear code which is good for homework (which I believe this question is..)
@Harish, yes that sounds much better than my in initial idea. thanks for that! any issues with the fact that some array positions may remain unpopulated? @Blogbeard, you're right, i have updated the post tags accordingly
@Baba, first populate both your arrays with zero and then start your process. It will be easier that way.
@SO people,if you have solved it,can you please post the working example so that it can be handy for further reference
@Deepak, there is no ONE solution to this question. all suggestions posted here work. some are better than others in terms of efficiency.
|
0

in ruby:

birthdays = {}.tap{|h| h.default = 0}
25.times { birthdays[rand(364)+1] += 1 }

puts birthdays.keys # all the uniq birthdays
puts birthdays.inspect # all birthdays with counts

idea is to use hash or fixed size array where keys correspond to birthdays and values are how many times that birthday was selected

problem is that you will lose order of birthdays as they are just keys in hash/array but if original order is not important you can always get random sequence of birthdays by doing

birthdays.keys.shuffle

Comments

0

Depending on how many times you will need to generate the random list you might consider creating a list of all the numbers and then using a shuffle sort to permute them, you can then pull consecutive blocks off the array which will be both in random order and unique. So in the worst case the performance would be O(size all numbers)+O(size of numbers needed), and best case is O(size of numbers needed). An example:

private int[] numbers;
private int numberIndex=Integer.MAX_VALUE;
private void initNumbers(){
  int[] numbers = new int[365];
  for(int i=0;i<365;i++){
    numbers[i] = i+1;
  }
  shuffleSort(numbers);
  numberIndex = 0;
}

public int[] getRandom(int howMany){
  if(numberIndex > numbers.length-howMany){
    initNumbers();
  }
  int[] ret = new int[howMany];
  for(int i=0;i<howMany;i++){
    ret[i] = numbers[numberIndex++];
  }
  return ret;
}

private void shuffleSort(int[] numbers){
  Random r = new Random();
  int tmp=0, swap=0;
  for(int i=numbers.length-1;i>0;i--){
    swap = r.nextInt(i);//random number between 0 and i
    tmp = numbers[i];
    numbers[i] = numbers[swap];
    numbers[swap] = tmp;
  }
}

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.