3

I am wondering if we implement our own hashmap that doesn't use power-of-two length hash tables (initial capacity and whenever we re-size), then in that case can we just use the object's hashcode and mod the total size directly instead of use a hash function to hash the object's hashcode ?

for example

   public V put(K key, V value) {
         if (key == null)
             return putForNullKey(value);
         // int hash = hash(key.hashCode());     original way
         //can we just use the key's hashcode if our table length is not power-of-two ?
         int hash = key.hashCode();              
         int i = indexFor(hash, table.length);
         ...
         ...
     }
0

3 Answers 3

3

Presuming we're talking about OpenJDK 7, the additional hash is used to stimulate avalanching; it is a mixing function. It is used because the mapping function from a hash to a bucket, since were using a power of 2 for the capacity, is a mere bitwise & (since a % b is equivalent to a & (b - 1) iff b is a power of 2); this means that the lower bits are the only important ones, so by applying this mixing step it can help protect against poorer hashes.

 static int hash(int h) {
     // This function ensures that hashCodes that differ only by
     // constant multiples at each bit position have a bounded
     // number of collisions (approximately 8 at default load factor).
     h ^= (h >>> 20) ^ (h >>> 12);
     return h ^ (h >>> 7) ^ (h >>> 4);
 }

If you want to use sizes that aren't powers of 2, the above may not be needed.

Actually changing the mapping from hashes to buckets (which normally relies on the capacity being a power of 2) will require you to you to look at indexFor:

 static int indexFor(int h, int length) {
     return h & (length-1);
 }

You could use (h & 0x7fffffff) % length here.

Sign up to request clarification or add additional context in comments.

9 Comments

You need something like (h & Integer.MAX_VALUE) % length or Math.abs(h % length)
@PeterLawrey is correct; you need to ensure you deal with negative hashes correctly.
I am wondering in terms of resize the capacity. Can i just simply times the capacity to a prime number ? I know that might be uneconomical.
@user1389813 HashMap rehashes when the load factor is to surpass the threshold, yes.
@user1389813 no, not all; TreeMap is implemented using a self-balancing binary search tree, not a hash table, and as such does not use hashing. LinkedHashMap, HashMap, and Hashtable, however, do. If a hash table becomes too full, you can experience performance decline as operations approach linear time.
|
1

You can think of the mod function as a simple form of hash function. It maps a large range of data to a smaller space. Assuming the original hashcode is well designed, I see no reason why a mod cannot be used to transform the hashcode into the size of the table you are using.

If your original hashfunction is not well implemented, e.g. always returns an even number, you will create quite a lot of collisions using just a mod function as your hashfunction.

Comments

1

This is true, you can pick pseudo-prime numbers instead.

Note: indexFor needs to use % compensating for the sign instead of a simple & which can actually make the lookup slower.

indexFor = (h & Integer.MAX_VALUE) % length
// or
indexFor = Math.abs(h % length) 

3 Comments

I am wondering if we pick pseudo-prime number then do we need to reconstruct the table completely whenever we resize the it ? since the index from the indexFor is now different.
You can have a progressive rebuild, but this is slower (but does avoid one big hit) The simplest way to avoid this is to make the Map large enough that this won't happen.
In performance-wise. Which one is better, default hashing function in Java API (power-of-two length table) or this approach ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.