I wish to override object's GetHashCode() method in all my classes. This method returns a Int32. All the cryptographic hash functions I know of return values that will not fit in a 32-bit integer. I want to avoid collisions as best as possible. Should I truncate a secure hash like SHA-whatever, or use a 32-bit hash? If using a 32-bit hash, what would be the best 32-bit hash to use?
4 Answers
Just a bit of information to all. The GetHashCode() across different .NET platforms differ. For example: "Hello".GetHashCode() in .NET 2.0 vs. "Hello".GetHashCode() in .NET 4.0 yield different results. Hence, why you can't serialize HashTables or Dictionaries out of the box using .NET.
Implementing your own hash algos provide consistancy across platforms. Just to let you know, you don't want to go less than Int32. My advice is to stick with Int64 (long). This way you have less collisions, which is the goal of hashing :) This is a library that I wrote years back. Each Hash algo have their pluses and minus (speed vs. least collision). This particular version uses Strings as an Input but you can modify it as you see fit:
static public class StringHash
{
//---------------------------------------------------------------------
static public Int64 RSHash(String str)
{
const Int32 b = 378551;
Int32 a = 63689;
Int64 hash = 0;
for (Int32 i = 0; i < str.Length; i++)
{
hash = hash * a + str[i];
a = a * b;
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 JSHash(String str)
{
Int64 hash = 1315423911;
for (Int32 i = 0; i < str.Length; i++)
{
hash ^= ((hash << 5) + str[i] + (hash >> 2));
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 ELFHash(String str)
{
Int64 hash = 0;
Int64 x = 0;
for (Int32 i = 0; i < str.Length; i++)
{
hash = (hash << 4) + str[i];
if ((x = hash & 0xF0000000L) != 0)
{
hash ^= (x >> 24);
}
hash &= ~x;
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 BKDRHash(String str)
{
const Int64 seed = 131; // 31 131 1313 13131 131313 etc..
Int64 hash = 0;
for (Int32 i = 0; i < str.Length; i++)
{
hash = (hash * seed) + str[i];
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 SDBMHash(String str)
{
Int64 hash = 0;
for (Int32 i = 0; i < str.Length; i++)
{
hash = str[i] + (hash << 6) + (hash << 16) - hash;
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 DJBHash(String str)
{
Int64 hash = 5381;
for (Int32 i = 0; i < str.Length; i++)
{
hash = ((hash << 5) + hash) + str[i];
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 DEKHash(String str)
{
Int64 hash = str.Length;
for (Int32 i = 0; i < str.Length; i++)
{
hash = ((hash << 5) ^ (hash >> 27)) ^ str[i];
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 BPHash(String str)
{
Int64 hash = 0;
for (Int32 i = 0; i < str.Length; i++)
{
hash = hash << 7 ^ str[i];
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 FNVHash(String str)
{
Int64 fnv_prime = 0x811C9DC5;
Int64 hash = 0;
for (Int32 i = 0; i < str.Length; i++)
{
hash *= fnv_prime;
hash ^= str[i];
}
return hash;
}
//---------------------------------------------------------------------
static public Int64 APHash(String str)
{
Int64 hash = 0xAAAAAAAA;
for (Int32 i = 0; i < str.Length; i++)
{
if ((i & 1) == 0)
{
hash ^= ((hash << 7) ^ str[i] * (hash >> 3));
}
else
{
hash ^= (~((hash << 11) + str[i] ^ (hash >> 5)));
}
}
return hash;
}
}
Comments
Eric Lippert has created a great blog entry on how to properly implement a GetHashCode() method. You need to remember that the purpose of GetHashCode() is to put objects into a hash table. Using it for this purpose means that you'll more then likely want to iterate through it or sort it some time in the future. If you are using crypto functions to do this, your iteration or sorting procedure will run very slow. Crypto functions are designed to secure data, not to uniquely identify them. Read through Eric Lippert's blog post. It will help you out
Comments
You could implement GetHashCode by truncating an SHA hash. But you probably shouldn't.
The purpose of GetHashCode is to allow objects to be inserted into hash tables. The purpose of hash tables is to optimize searches: On average, finding a key in a hash table requires only O(1) time, compared to O(log n) for a tree, or O(n) for an unsorted list.
You do want your GetHashCode method to minimize collisions in order to prevent your hash table lookups from degenerating to O(n) time. But you also want them to be fast, because the whole point of hash tables is optimization. If your hash code takes a long time to compute, you might as well have just stored your data in a List.
Cryptographic hashes are slow. They're typically designed that way in order to discourage brute force attacks. This makes them unsuitable for use with GetHashCode.
So, how should you implement GetHashCode? A simple, frequently-used approach is just to XOR together all the member variables that get used in your Equals function.
struct Complex
{
double real;
double imag;
public override int GetHashCode()
{
return real.GetHashCode() ^ imag.GetHashCode();
}
// ...
}
Another simple approach, good for array-like objects, is a polynomial hash function.
class MyClass
{
int[] data;
public override int GetHashCode()
{
int result = 0;
foreach (int n in data)
{
result = result * 41 + n;
}
return result;
}
// ...
}
If you class contains a large amount of data to hash, you might want to save the hash code in a member variable and precompute it during construction, so that GetHashCode() can then just use that variable.
2 Comments
real and image are the same, you're always going to get 0 as a result.GetHashCode(). new Point(x, x).GetHashCode() gives 0 for any value of xThe shorter the width of the hash value, the more likely you are to get collisions. Since Int32 stores a maximum of 4294967296 different values you'll need to consider whether this will hold a unique enough value for your purposes - which will depend if this is for security or identity checking.
I'm interested in why you want to override GetHashCode(), does the value have to fit into 32bits? If so why?
6 Comments
GetHashCode() and even more intriguing why are you overriding Equals(). I'd like to know your problem to formulate a better solution.GetHashCode() for that, no matter if 32 bits are enough for you or not.
GetHashCode? Unless you have a really good reason to do so and you understand all of the ramifications, don't.