Java String to short hash code

Question

I want to "implement" a hash function from Strings to shorts, using the java standard hashCode() function of String object. I came up with the following simple implementation:

static short shortHashCode(String str)
{
   int strHashCode = str.hashCode();
   short shorterHashCode = (short) (strHashCode % Short.MAX_VALUE);
   return shorterHashCode;
}

Is my shortHashCode function a good hash function? Meaning is the chance of collisions small (chance that two different Strings will have the same hash code close to 1/Short.MAX_VALUE) ?
Is there a better way to implement hash function from Strings to shorts?

Yes! a hash function must return the same hash code for identical inputs — zvisofer
– zvisofer, Commented Aug 4, 2014 at 14:52
As far as chances are concerned, there are a possibility of 32768 different short values returned. The chances are completely depended on the input set. If you use a random generator, then it depends on what statistic model that generator is based off (does each value have an equal chance or do some values have higher chances over others, etc.). — But I'm Not A Wrapper Class
– But I'm Not A Wrapper Class, Commented Aug 4, 2014 at 14:55
This question appears to be off-topic because it is asking for a code review and opinion on working code. It is off-topic here as ( too broad and opinion based ) and should probably be on codereview.stackexchange.com — user177800
– user177800, Commented Aug 4, 2014 at 15:00
@JarrodRoberson, I think the question more relates to convert between two language-specific primitives while preserving the useful semantic properties of the original. That seems on-topic to me. — Mike Samuel
– Mike Samuel, Commented Aug 4, 2014 at 15:03

Mike Samuel · Accepted Answer · 2014-08-04 15:01:19Z

5

(short) (strHashCode % Short.MAX_VALUE);

is losing information unnecessarily.

 (short) (strHashCode % ((Short.MAX_VALUE + 1) << 1));

would not, but would be equivalent anyway to

 (short) strHashCode

since casting an integral type to a smaller integral type just truncates the most significant bits.

It also assumes that all bits have the same entropy, which may not be true. You could try and spread the entropy around:

 (short) (strHashCode ^ (strHashCode >>> 16))

which XORs the high 16 bits with the low 16 bits.

Meaning is the chance of collisions small (chance that two different Strings will have the same hash code close to 1/Short.MAX_VALUE) ?

java.lang.String.hashCode is not a cryptographically strong hash function, so it only has that property if an attacker can't control one or both inputs to force a collision.

If you expose it to strings from an untrusted source, you might see a much higher rate of hash collisions, possibly allowing an attacker to deny service.

Also, it is designed to tradeoff a small increase in collision rate for better performance, and cross-version stability. There are better string hashing functions out there.

edited Aug 4, 2014 at 15:01

answered Aug 4, 2014 at 14:55

Mike Samuel

121k30 gold badges230 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David Soroko Over a year ago

Some examples showing that generating collisions for String.hascCode is pretty easy: david-soroko.blogspot.co.uk/2015/06/…

Collectives™ on Stack Overflow

Java String to short hash code

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related