0

I have a rather large int[] which is sorted using Arrays.sort() .. I need to remove the duplicate elements from the array.

This question originates from sedgewick's Algorithms book 1.1.28

1.1.28 Remove duplicates. Modify the test client in BinarySearch to remove any du- plicate keys in the whitelist after the sort.

I tried to create a noDupes() method which takes in an int[] and returns an int[] with duplicates removed

The rank() methods are from sedgewick's code.which does the binary search

public static int[] noDupes(int[] a){
    Arrays.sort(a);
    int maxval= a[a.length-1];
    int[] nodupes = new int[maxval];
    int i=0;
    for(int j=0;j<a.length;j++){
        int rnk = rank(a[j],nodupes);
        System.out.println(a[j]+" rank="+rnk);
        if (rnk < 0){
            System.out.println(a[j]+" is not dupe");
            nodupes[i] = a[j];
            i++;
        }
    }

    return nodupes;
}
public static int rank(int key,int[] a){
    return rank(key,a,0,a.length-1);
}

public static int rank(int key,int[] a,int lo,int hi){
    if(lo > hi) return -1;
    int mid = lo+(hi-lo)/2;

    if(key < a[mid])return rank(key,a,0,mid-1);
    else if(key > a[mid])return rank(key,a,mid+1,hi);
    else return mid;
}

When I ran this with a sample array

int[] a =new int[]{2,2,2,3,4,4,5,6};
int[] ret = noDupes(a);

I am getting some unexpected output..even after 2 is added to the nodupes array, the rank for an existing element is -1..

2 rank=-1
2 is not dupe
2 rank=-1
2 is not dupe
2 rank=-1
2 is not dupe
3 rank=-1
3 is not dupe
4 rank=-1
4 is not dupe
4 rank=4
5 rank=-1
5 is not dupe
6 rank=-1
6 is not dupe
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
    at ...noDupes(BinSearch.java:85)
    at ...main(BinSearch.java:96)

I couldn't figure out what I am doing wrong..can someone help?

7
  • why can't you use Set<Integer>? Commented Jun 11, 2013 at 9:28
  • 1
    @sanbhat - because that's not what the exercise is about. Commented Jun 11, 2013 at 9:35
  • I am trying to learn how to do this without using any library classes..I think the exercise means to solve this using binarysearch Commented Jun 11, 2013 at 9:39
  • Something wrong here. If the array is already sorted you can find the duplicates by a linear scan. No binary search required. Commented Jun 11, 2013 at 10:43
  • @EJP you are right..linear scan will do Commented Jun 11, 2013 at 11:03

5 Answers 5

3

I would do it this way

public static int[] noDupes(int[] a) {
    Arrays.sort(a);
    int noDupCount = 0;
    for (int i = 0; i < a.length; i++) {
        if (i == 0 || a[i] != a[i - 1]) {
            noDupCount++;
        }
    }
    int[] a2 = new int[noDupCount];
    for (int i = 0, j = 0; i < a.length; i++) {
        if (i == 0 || a[i] != a[i - 1]) {
            a2[j++] = a[i];
        }
    }
    return a2;
}
Sign up to request clarification or add additional context in comments.

3 Comments

wouldn't that take O(n) time ?
it will, but I dont think you can make it go without iterating over full array
You can speed this up by using a larger-than-necessary array, looping once and doing System.arraycopy
3

just add all the array values to the HashSet it will automatically remove duplicates and gives you unique values and then again convert it to array that you required

2 Comments

This won't keep the order - although of course you should do the sort after removing duplicates.
Well, use a SortedSet instead of an HashSet and the order will be kept.
2

If you have your array sorted and if you want to remove duplicates I think you don't need to use binary search for that.

when you sort an array, duplicate elements will be adjacent to each other.

E.g. Array = {9,8,9,1,2,5,2,5,1} After sorting Array = {1,1,2,2,5,5,8,9,9}

You can use following way to remove duplicates (inplace)

int a[] = {sorted array}

for(int i=0,target=0;i<a.length-1;i++) {
  if(a[i]!=a[i+1]) {
     a[target++] = a[i];
  }
}
a[target++] = a[a.length-1];
for(int i=target;i<a.length;i++) {
a[i] = 0; // fill in the values which you don't want.
}

will remove duplicates in one pass only

Comments

0

This should help:

int[] nodupes = new int[a.length];

nodupes array is going out of bound.

Note: I am not sure if logic that you are using is the best for the problem. But this should solve your exception.

Comments

0

This code will help you.

public Integer[] removeDuplicates(Integer[] input){
        Integer[] arrayWithoutDuplicates = null;
        Set<Integer> set = new LinkedHashSet<Integer>();
        for(int i : input){
            set.add(i);
        }
        arrayWithoutDuplicates = (Integer[]) set.toArray();
        return arrayWithoutDuplicates;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.