3

I have a task to implement a hash code of a string in java using the definition. I wrote this code.

   public int hash(String str) {
        int hashValue = 0;
        int power;
        for (int i = 0; i < str.length(); i++) {
            power = (str.length() -1 - i);
            hashValue = hashValue + str.charAt(i) * (int) Math.pow(31, power);
        }
        return hashValue;
    }

I found out that the result in my method is the same as hashcode() only for strings with a length lower than 8. Is this supposed to be that way or my method isn't accurate? I've seen that maybe the hash code has changed for the string over 8 chars.

4
  • If you want to re-implement the same hash algorithm as String.hashCode() (though, why??), look at the source code of String.hashCode(). It comes with the JDK, and any good IDE will show it to you. Or see here: Java 6, Java 8, Java 10. Commented May 9, 2018 at 20:45
  • Hint: you don't need to use Math.pow. You can raise a number to the power simply by multiplying it by 31 on each loop iteration. Commented May 9, 2018 at 20:45
  • i dont need to re-implement the same hash algorithm . i need to implement it using this formula: str[0] ∗ 31^(𝑛 − 1) + str[1] ∗ 31^(𝑛) − 2 + . . . + str[𝑛 − 1]. Commented May 9, 2018 at 20:51
  • 1
    "i dont need to re-implement the same hash algorithm " That is the same hash algorithm. Commented May 9, 2018 at 20:55

2 Answers 2

2

Look at hashCode implementation in jdk:

public static int hashCode(byte[] value) {
    int h = 0;
    int length = value.length >> 1;
    for (int i = 0; i < length; i++) {
        h = 31 * h + getChar(value, i);
    }
    return h;
}

It might happen, that your method produces the same result as this one. It does not matter, actually. It is just a hashing method.
Note, that hashing method does not need to be "accurate". It is a way of reducing an arbitrary object (string) to an int. You can use any method you want.

Sign up to request clarification or add additional context in comments.

2 Comments

What version of JDK is that?
This is jdk 9, but I believe, this method down not change much between releases.
1

Your implementation of a hash code for a string is similar to Java's String class's hashCode implementation, but it's not exactly the same due to the subtle way that Java narrows the double returned by Math.pow to an int.

For the string "abcdefg", 7 characters long, your method and Java's method agree - they both return -1206291356. For the string "abcdefgh", 8 characters long, you method and Java's method disagree -- yours returns 1858279332, while Java's method returns 1259673732.

First, let's cover the ways that they are similar. Here's Java 8's code from Grepcode for reference:

public int More ...hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

Java's implementation for String multiplies a factor of 31 every time the loop occurs. Effectively, there is a power of 31 for each character.

Your implementation attempts to determine directly the power of 31 to multiply by the character value, by using Math.pow, which returns a double. Then you cast it back to an int, because that's what the hash code's type is.

Now, let's discuss the subtle difference.

Java's String hashCode implementation only multiplies and adds ints -- even if overflow occurs, it's int overflow, during which the lower 32 bits of information are preserved.

For your implementation with Math.pow, the JLS, Section 5.1.3, covers the primitive narrowing conversion that occurs when you cast a double down to an int.

A narrowing conversion of a floating-point number to an integral type T takes two steps:

  1. In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int, as follows:

    • If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.

    • Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are two cases:

a. If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.

b. Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.

  • Otherwise, one of the following two cases must be true:

a. The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is the smallest representable value of type int or long.

b. The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.

(bold emphasis mine)

When you have a 7-character string, you calculate 316, which is 887,503,681, still representable as an int. However, when you have an 8-character string, you calculate 317, which is 27,512,614,111, and it is too big to fit in an int -- the maximum value for an int is about 2 billion. The narrowing conversion converts it to the maximum integer value, which is 2,147,483,647. At this point, you are using a different value than what Java's String hashCode method is effectively using. The lower 32 bits of the true answer are not preserved in your method as they are in Java's String hashCode method. This is the subtle difference which changes your value when your strings are 8 characters are longer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.