1

Attempting to write my own hash function in Java. I'm aware that this is the same one that java implements but wanted to test it out myself. I'm getting collisions when I input different values and am not sure why.

public static int hashCodeForString(String s) {   
int m = 1;
int myhash = 0;
    for (int i = 0; i < s.length(); i++, m++){
    myhash += s.charAt(i) * Math.pow(31,(s.length() - m));
    }
return myhash;
} 
5
  • Math.pow(...) returns a double. Does this compile? Commented Jul 25, 2016 at 1:35
  • 1
    The Java String hashCode implementation does not use Math.pow but rather uses int math, and allows for int overflow to be part of the calculation. Your calculation doesn't, and that's a huge difference. Commented Jul 25, 2016 at 1:40
  • Could you give more details about int math and how I can implement it in my hashcode? Commented Jul 25, 2016 at 1:41
  • 1
    Nevermind, I can just use the hashCode() method that is already overridden in the String class instead of trying to create my own. Commented Jul 25, 2016 at 1:52
  • An int has 32 bits and allows ~4 billion values. Collisions are to be expected. Java's hashCode() is not expected or required to produce globally unique values. If you want to reduce the chance of collision to near zero you need at least 128 bits and a really good hash function. Commented Jul 25, 2016 at 1:56

1 Answer 1

2

Kindly remember just how a hash-table (in any language ...) actually works:   it consists of a (usually, prime) number of "buckets." The purpose of the hash-function is simply to convert any incoming key-value into a bucket-number.   (The worst-case scenario is always that 100% of the incoming keys wind-up in a single bucket, leaving you with "a linked list.")   You simply strive to devise a hash-function that will "typically" produce a "widely scattered" distribution of values so that, when calculated modulo the (prime ...) number of buckets, "most of the time, most of the buckets" will be "more-or-less equally" filled. (But remember: you can never be sure.)

"Collisions" are entirely to be expected:   in fact, "they happen all the time."

In my humble opinion, you're "over-thinking" the hash-function:   I see no compelling reason to use Math.pow() at all. Expect that any value which you produce will be converted to a hash-bucket number by taking its absolute value modulo the number of buckets.   The best way to see if you came up with a good one (for your data ...) is to observe the resulting distribution of bucket-sizes.   (Is it "good enough" for your purposes yet?)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.