7

I understand HashSet based on HashMap, since they are pretty similar. It makes the code more flexible and minimizes implementation effort. However, one reference variable in the HashSet's Entry seem to be unnecessary for me if the class forbids null element, therefore the whole Entry makes no point. Despite this fact, an Entry takes 24 byte memory / element, whereas a single array with the set's elements would take only 4 byte / element if my figures are correct. (aside from array's header)

If my argument is correct, does the advantages really overweight this performance hit?

(if i am wrong, i would learn from it aswell)

8
  • 1
    A single array wouldn't be a HashSet. How would you have O(1) contains() with a simple array? Commented Feb 27, 2016 at 11:10
  • 2
    @JBNizet linear probing (or generally open addressing) works with just a single array. I'm also curious what the design decision was, but I'm not sure if we find the author here to report ;-) Commented Feb 27, 2016 at 11:15
  • 1
    @JBNizet You can easily implement several types of hash tables in an array , i.e. cuckoo, linear, etc... EDIT: linear doesn't have O(1) contains(), but cuckoo does Commented Feb 27, 2016 at 11:16
  • 1
    Some libraries implement their Map as an extension of their Set and this would be more efficient IMHO, however this decision was made at least 20 years ago, so while it made sense at the time, it might not be optimal now. Commented Feb 27, 2016 at 12:02
  • 1
    @PeterLawrey a bit less than 20 years ago, actually. 20 years ago Java 1.1 was not out yet. And HashSet was introduced in 1.2. Commented Feb 27, 2016 at 12:50

1 Answer 1

2

Though this question is primarily opinion-based, I'll summarize a few points on the topic:

  • HashSet appeared in Java 1.2 many years ago. It's hard to guess now the exact reasons for design decisions made at that times, but clearly Java wasn't used for high-loaded applications; performance played less role than simplicity.
  • You are right that HashSet is suboptimal in its memory consumption. The problem is known, the bug JDK-6624565 is registered, and discussions at core-libs-dev are held from time to time. But is this a blocker for many real world applications? Probably, no.
  • For those uncommon applications where HashSet memory usage is unacceptable, there are already good alternatives, like trove THashSet.
  • Note that open addressing algorithms has their disadvantages, e.g. significant performance degradation with load factors close to 1; element removal difficulties. See the related answer.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.