0

I need a script to search efficiently all the duplicates in a one-dimensional array. I tried a naive method :

for(var i=0, ii<arr.length-1; i<ii; i++)
    for(var j=i+1, jj<arr.length; j<jj; j++)
        if(arr[i] == arr[j])
            // remove the duplicate

Very simple but it takes a too long time if the array contains a large set of values. The tables that I use often contain hundreds of thousands of values, so that the number of iterations required for this operation is HUGE !

If someone has an idea !?

2
  • possible duplicate stackoverflow.com/questions/840781/… Commented Feb 14, 2014 at 10:54
  • unless you have some restrictions on the values I would vote for the dup. Commented Feb 14, 2014 at 10:57

2 Answers 2

2

Use a LinkedHashSet or OrderedHashSet implementation, it does not allow duplicates and provides expected O(1) on insertion, lookup, and deletion. Since your OP says you want to remove the duplicates, there is no faster way to do this than O(n). In an array of 1,000,000 items max time was 16ms

  • Create a LinkedHashSet hs
  • foreach object obj in arr -- hs.add(obj);

Complexity is expected O(n) with a good hash function.

Sign up to request clarification or add additional context in comments.

7 Comments

Strictly speaking, hash set does not guarantee O(n) complexity.
I never said that it does, I said that removing duplicates from the dataset is worst case O(n). HashSet guarantees O(1) for operations
I meant that using of hash set does not guarantee overall O(n) complexity. No, hash set does not guarantee O(1) due to possible hash collisions.
Ok, i try up this solution. A verification before add a value into array seems a good solution for me !
My mistake, LinkedHashSet or OrderHashSet implementation avoids collisions. Could you explain why it does not guarantee O(n)? If the number of objects in the array are known, capacity can be set.
|
0

This code could be the most efficient way you can do it ..!! Which is nothing but the direct implementation of set .

function eliminateDuplicates(arr) {
  var i,
      len=arr.length,
      out=[],
      obj={};

  for (i=0;i<len;i++) {
    obj[arr[i]]=0;
  }
  for (i in obj) {
    out.push(i);
  }
  return out;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.