Since you're required to sort anyway, I think this would be a good way to do it. The algorithm simply keeps track of the last value added to make certain not to add a duplicate.
public static String[] no_repeats(String[] a) {
Arrays.sort(a);
ArrayList<String> ret = new ArrayList<>();
String lastAdded = "";
for (String str : a) {
if (!str.equals(lastAdded)) {
ret.add(str);
}
lastAdded = str;
}
return ret.toArray(new String[0]);
}
Of course, you could write your own minimal hash implementation to speed up lookups in a home grown hash set. In that case, sorting after you've skipped duplicate elements would be the way to go since hash based lookups, unlike List.contains() calls, are very efficient. And then you would be sorting on fewer items.
Here is how it might look with your own set implementation to speed up the process.
- set.add() returns false if the element is already there.
- so when it returns true, it must be the first time encountered so add it to the list.
- now sort a smaller list.
- and return the array.
public static String[] no_repeats(String[] a) {
MiniHashSet<String> set = new MiniHashSet<>();
ArrayList<String> ret = new ArrayList<>();
for (String str : a) {
if (set.add(str)) {
ret.add(str);
}
}
Collections.sort(ret);
return ret.toArray(new String[0]);
}
This is a simple implementation that just has an add and contains method to speed up the lookup process. The hash table size is rather large to reduce the chances of a collision.
- this class uses the hashCode of the object to obtain the proper
bucket to place a list.
- Each list may hold what ever items hash to that bucket.
-The returned bucket will either be the existing list for that index into the array or an new list where none previously existed.
@SuppressWarnings("unchecked")
class MiniHashSet<T> {
int size = 10_000;
List<T>[] data = new ArrayList[size];
public boolean add(T val) {
List<T> b = getBucket(val);
if (!b.contains(val)) {
b.add(val);
return true;
}
return false;
}
public boolean contains(T val) {
List<T> b = getBucket(val);
return b.contains(val);
}
private List<T> getBucket(T val) {
int i = val.hashCode() % size;
List<T> b = data[i];
if (b == null) {
b = new ArrayList<>();
data[i] = b;
}
return b;
}
}
Although this is quite a bit of extra work, this solution is significantly better than the first one I offered because the lookup is efficient and the sort may now happen after the duplicates have been removed.