java hashing function collision

Question

Attempting to write my own hash function in Java. I'm aware that this is the same one that java implements but wanted to test it out myself. I'm getting collisions when I input different values and am not sure why.

public static int hashCodeForString(String s) {   
int m = 1;
int myhash = 0;
    for (int i = 0; i < s.length(); i++, m++){
    myhash += s.charAt(i) * Math.pow(31,(s.length() - m));
    }
return myhash;
}

The Java String hashCode implementation does not use Math.pow but rather uses int math, and allows for int overflow to be part of the calculation. Your calculation doesn't, and that's a huge difference. — Hovercraft Full Of Eels
– Hovercraft Full Of Eels, Commented Jul 25, 2016 at 1:40
Could you give more details about int math and how I can implement it in my hashcode? — Earl Clark
– Earl Clark, Commented Jul 25, 2016 at 1:41
Nevermind, I can just use the hashCode() method that is already overridden in the String class instead of trying to create my own. — Earl Clark
– Earl Clark, Commented Jul 25, 2016 at 1:52
An int has 32 bits and allows ~4 billion values. Collisions are to be expected. Java's hashCode() is not expected or required to produce globally unique values. If you want to reduce the chance of collision to near zero you need at least 128 bits and a really good hash function. — Jim Garrison
– Jim Garrison, Commented Jul 25, 2016 at 1:56

Mike Robinson · Accepted Answer · 2016-07-25 02:45:33Z

Kindly remember just how a hash-table (in any language ...) actually works: it consists of a (usually, prime) number of "buckets." The purpose of the hash-function is simply to convert any incoming key-value into a bucket-number. (The worst-case scenario is always that 100% of the incoming keys wind-up in a single bucket, leaving you with "a linked list.") You simply strive to devise a hash-function that will "typically" produce a "widely scattered" distribution of values so that, when calculated modulo the (prime ...) number of buckets, "most of the time, most of the buckets" will be "more-or-less equally" filled. (But remember: you can never be sure.)

"Collisions" are entirely to be expected: in fact, "they happen all the time."

In my humble opinion, you're "over-thinking" the hash-function: I see no compelling reason to use Math.pow() at all. Expect that any value which you produce will be converted to a hash-bucket number by taking its absolute value modulo the number of buckets. The best way to see if you came up with a good one (for your data ...) is to observe the resulting distribution of bucket-sizes. (Is it "good enough" for your purposes yet?)

Collectives™ on Stack Overflow

java hashing function collision

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related