5

I used to thing that HashSet is a pretty fast data structure implementation because it uses hashes (and is implemented via HashMap in its turn). I was solving some problems and decided to check performance issue, so here it is:

You are given an array with numbers - [11, 3, 11, 11, 3, 2, 0, -2, 2] You are supposed to write a function that returns the number that appears "odd" number of times.

Here is my solution:

public class OddNumInArray {

public static List<Integer> oddNumList(int [] ar){
    Collection <Integer> l = new ArrayList<>();
    for (int n: ar) {
        if (l.contains(n)) {
            l.remove(n);
        }
        else {
            l.add(n);
        }
    }
    return (List) l;
}

public static Set<Integer> oddNumHSet(int [] ar){
    Set <Integer> l = new HashSet<>();
    for (int n: ar) {
        if (l.contains(n)) {
            l.remove(n);
        }
        else {
            l.add(n);
        }
    }
    return l;
}

public static void main(String [ ]arg) {
    int [] a1 = new int [10000000];
    for (int i=0; i<10; i++){
        a1[i]=(new Random()).nextInt(5);
    }
    long cur= System.nanoTime();
    System.out.println(oddNumList(a1));
    long c1 = System.nanoTime()-cur;
    System.out.println("TIME CONSUMED:" +c1);
    cur= System.nanoTime();
    System.out.println(oddNumHSet(a1));
    long c2 = System.nanoTime()-cur;
    System.out.println("TIME CONSUMED:" + c2);
    System.out.println("c1/c2*100: "+ (new Double(c1)/new Double(c2)*100));
}

}

And here is an output:

[1, 0]
TIME CONSUMED:101804000
[0, 1]
TIME CONSUMED:183261000
c1/c2*100: 55.55137208680516

So, why is implementation with ArrayList is quicker than one with HashSet by almost 2 times? Thank you.

10
  • Obviously, It depends on your use case, Commented Oct 29, 2014 at 6:07
  • I don't think this is a duplicate? Commented Oct 29, 2014 at 6:10
  • Why is it duplicate? I ask for a link then. I searched; this question was not asked before. Commented Oct 29, 2014 at 6:11
  • 2
    @chrylis - I think this must be a duplicate of ArrayList vs HashSet instead of how to do Microbenchmarking? Commented Oct 29, 2014 at 6:15
  • 1
    @TheLostMind agreed. Commented Oct 29, 2014 at 6:18

1 Answer 1

9

ArrayList doesn't have code to check for duplicates. So, it just adds elements as and how you try to add them. A HashSet on the other hand is meant to have only unique elements, so it makes a check to prevent insertion of duplicate elements.

Sign up to request clarification or add additional context in comments.

8 Comments

So, if I add my own implementation of HashCollection_name via HashMap it will be faster, right?
@GeorgeRevkov That doesn't make any sense - a HashMap by definition can't have duplicate keys, because that's the way they're looked up. A specific hash-code is made for each item - what does it mean if you have two items with the same hash code? How do you find the second item?
@GeorgeRevkov - Actually, a HashSet uses a HashMap behind the scenes.. So, it will not be faster.. ArrayList uses an Array behind the scenes, so it will be faster than both HashMap as well as HashSet.
I am talking about the way how HashSet is implemented. It uses value added to HashSet as a key for HashMap and then adds a default value to map. If I understand correctly, the most time-consuming moment is to look whether is has dups or not, which is implemented in HashSet(am I right here?). If I implement HashSet, but without checking for dups, I will save time, am I?
@GeorgeRevkov - It will still be slower than an ArrayList :P
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.