2

I wish to override object's GetHashCode() method in all my classes. This method returns a Int32. All the cryptographic hash functions I know of return values that will not fit in a 32-bit integer. I want to avoid collisions as best as possible. Should I truncate a secure hash like SHA-whatever, or use a 32-bit hash? If using a 32-bit hash, what would be the best 32-bit hash to use?

9
  • 1
    Why do you want to override GetHashCode? Unless you have a really good reason to do so and you understand all of the ramifications, don't. Commented Apr 18, 2013 at 22:57
  • 9
    Hashing for security and hashing for quick identity checking are two very different problems. For an identity hash, you don't care whether it's easy to derive new instances that produce the same hash; all you care is that it's reasonably well distributed over typical samples you use and that it be quick to compute. Commented Apr 18, 2013 at 22:58
  • 3
    You do not want a cryptographic hash, you want a fast, well-distributed hash. Cryptographic hashes are often designed to be as slow as possible, and may not necessarily be well-distributed. As Daniel says, don't do this unless you fully understand what you're doing. Commented Apr 18, 2013 at 22:58
  • 3
    blogs.msdn.com/b/securitytools/archive/2009/08/27/… Commented Apr 18, 2013 at 22:58
  • 10
    Read and understand my article on this subject before you write any code: blogs.msdn.com/b/ericlippert/archive/2011/02/28/… Commented Apr 18, 2013 at 23:06

4 Answers 4

8

Just a bit of information to all. The GetHashCode() across different .NET platforms differ. For example: "Hello".GetHashCode() in .NET 2.0 vs. "Hello".GetHashCode() in .NET 4.0 yield different results. Hence, why you can't serialize HashTables or Dictionaries out of the box using .NET.

Implementing your own hash algos provide consistancy across platforms. Just to let you know, you don't want to go less than Int32. My advice is to stick with Int64 (long). This way you have less collisions, which is the goal of hashing :) This is a library that I wrote years back. Each Hash algo have their pluses and minus (speed vs. least collision). This particular version uses Strings as an Input but you can modify it as you see fit:

static public class StringHash
    {
        //---------------------------------------------------------------------
        static public Int64 RSHash(String str)
        {
            const Int32 b = 378551;
            Int32 a = 63689;
            Int64 hash = 0;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash = hash * a + str[i];
                a = a * b;
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 JSHash(String str)
        {
            Int64 hash = 1315423911;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash ^= ((hash << 5) + str[i] + (hash >> 2));
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 ELFHash(String str)
        {
            Int64 hash = 0;
            Int64 x = 0;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash = (hash << 4) + str[i];

                if ((x = hash & 0xF0000000L) != 0)
                {
                    hash ^= (x >> 24);
                }
                hash &= ~x;
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 BKDRHash(String str)
        {
            const Int64 seed = 131; // 31 131 1313 13131 131313 etc..
            Int64 hash = 0;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash = (hash * seed) + str[i];
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 SDBMHash(String str)
        {
            Int64 hash = 0;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash = str[i] + (hash << 6) + (hash << 16) - hash;
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 DJBHash(String str)
        {
            Int64 hash = 5381;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash = ((hash << 5) + hash) + str[i];
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 DEKHash(String str)
        {
            Int64 hash = str.Length;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash = ((hash << 5) ^ (hash >> 27)) ^ str[i];
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 BPHash(String str)
        {
            Int64 hash = 0;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash = hash << 7 ^ str[i];
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 FNVHash(String str)
        {
            Int64 fnv_prime = 0x811C9DC5;
            Int64 hash = 0;

            for (Int32 i = 0; i < str.Length; i++)
            {
                hash *= fnv_prime;
                hash ^= str[i];
            }

            return hash;
        }
        //---------------------------------------------------------------------
        static public Int64 APHash(String str)
        {
            Int64 hash = 0xAAAAAAAA;

            for (Int32 i = 0; i < str.Length; i++)
            {
                if ((i & 1) == 0)
                {
                    hash ^= ((hash << 7) ^ str[i] * (hash >> 3));
                }
                else
                {
                    hash ^= (~((hash << 11) + str[i] ^ (hash >> 5)));
                }
            }

            return hash;
        }
    }
Sign up to request clarification or add additional context in comments.

Comments

2

Eric Lippert has created a great blog entry on how to properly implement a GetHashCode() method. You need to remember that the purpose of GetHashCode() is to put objects into a hash table. Using it for this purpose means that you'll more then likely want to iterate through it or sort it some time in the future. If you are using crypto functions to do this, your iteration or sorting procedure will run very slow. Crypto functions are designed to secure data, not to uniquely identify them. Read through Eric Lippert's blog post. It will help you out

Comments

2

You could implement GetHashCode by truncating an SHA hash. But you probably shouldn't.

The purpose of GetHashCode is to allow objects to be inserted into hash tables. The purpose of hash tables is to optimize searches: On average, finding a key in a hash table requires only O(1) time, compared to O(log n) for a tree, or O(n) for an unsorted list.

You do want your GetHashCode method to minimize collisions in order to prevent your hash table lookups from degenerating to O(n) time. But you also want them to be fast, because the whole point of hash tables is optimization. If your hash code takes a long time to compute, you might as well have just stored your data in a List.

Cryptographic hashes are slow. They're typically designed that way in order to discourage brute force attacks. This makes them unsuitable for use with GetHashCode.

So, how should you implement GetHashCode? A simple, frequently-used approach is just to XOR together all the member variables that get used in your Equals function.

struct Complex
{
    double real;
    double imag;

    public override int GetHashCode()
    {
        return real.GetHashCode() ^ imag.GetHashCode();
    }

    // ...
}

Another simple approach, good for array-like objects, is a polynomial hash function.

class MyClass
{
    int[] data;

    public override int GetHashCode()
    {
        int result = 0;

        foreach (int n in data)
        {
            result = result * 41 + n;
        }

        return result;
    }

    // ...
}

If you class contains a large amount of data to hash, you might want to save the hash code in a member variable and precompute it during construction, so that GetHashCode() can then just use that variable.

2 Comments

The problem with XORing is that if real and image are the same, you're always going to get 0 as a result.
@svick Wow, this didn't occur to me when I was looking through the BCL implementations of GetHashCode(). new Point(x, x).GetHashCode() gives 0 for any value of x
0

The shorter the width of the hash value, the more likely you are to get collisions. Since Int32 stores a maximum of 4294967296 different values you'll need to consider whether this will hold a unique enough value for your purposes - which will depend if this is for security or identity checking.

I'm interested in why you want to override GetHashCode(), does the value have to fit into 32bits? If so why?

6 Comments

I override Equals(Object obj) in all of my classes, and I want to override GetHashCode() because it throws warning for all of them. If you override Equals(), the compiler wants you to override GetHashCode() as well, and I would like to do that correctly.
Why do you override GetHashCode() and even more intriguing why are you overriding Equals(). I'd like to know your problem to formulate a better solution.
I am override Equals() because I want to compare my objects according to my specifications, not the default implementation.
“which will depend if this is for security or identity checking.” You should never ever use GetHashCode() for that, no matter if 32 bits are enough for you or not.
@CharlesPerniciaroIII: You are correct that bit count alone make it difficult to use a crypto hash as GetHashCode. But more generally, crypto hashes and GetHashCode are trying to solve completely different problems; the characteristics of a good crypto hash are often the opposite of a good hash table balancer. Balancing a hash table and signing a document have almost nothing in common; that both operations involve computing a "hash" is unfortunate.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.