1

I have an array of int, I want to create a hash function for it, so that two integer arrays with different elements results in the same hash values for low possibility, what is the best way to do that?

The length of array could be up to 500, the integer number could be from 0 to 50.

Note that there is not exact duplicate of the question, as the nature of integer array (length and range of number) is different.

I use this before

public int GetHashCode(int[] data)

{
    if (data == null)
    return 0;

int result = 17;



foreach (var value in data)
{
    result += result * 23 + value;
}


return result;
}

but I discover it has many collision.

What I want to solve is to construct a dictionary<int[], string> so that when integer of the same values should results in different Hashcode.

14
  • what do you mean by "BEST"? no collision at all? small ram usage? LOW cpu usage? for low cpu/ram usage sum all array elements in a long long and hope for the best. remember a hash function will ALWAYS have some collision, so if it say array are equals, you will have to check by yourself Commented Feb 26, 2014 at 11:16
  • @lesto I know there will be collision, what i mean is reduce to the lowest Commented Feb 26, 2014 at 11:19
  • How do different length and number range make it different? Commented Feb 26, 2014 at 11:19
  • @william007 so that two integer arrays with different elements do not result in the same hash values - impossible, sum[n=0..50](500^n) is way over Int32.MaxValue Commented Feb 26, 2014 at 11:20
  • 1
    @william007: Why exactly is the linked question not a duplicate? Do you actually care about the chance of collisions? How many of these arrays are you going to hash? Have you thought about using a keyed container instead? What's the problem you are trying to solve? There are too many unanswered questions. You ask for a hash but do not demonstrate adequate knowledge on the subject, so I 'm not at all sure a hash is the correct solution in the first place. Commented Feb 26, 2014 at 11:25

2 Answers 2

3

two integer arrays with different elements do not result in the same hash values

This is not possible for arrays with more than one element. An array with N elements has 32*N bits of information, you cannot map it to the 32 bits of the hash code without losing some information, unless N=1.

For N>1 there will be a very large number of array pairs for which the hash code is the same, while the arrays are different. There are techniques that make it less likely that a pair of arrays chosen at random would have the same hash code, but it is not possible to eliminate collisions completely for the general case.

The length of array could be up to 500, the integer number could be from 0 to 50

You need approximately 2500 bits to represent an array like that; your hash value has only 32 bits, so you will have lots of hash collisions as well. You can do a perfect hash for arrays of zero to five elements with values 0..50 by packing the numbers in an int (use value 51 to represent "a missing value" so that you could pack arrays of different length). Once you need to add the sixths number to the mix, your hash would not longer be perfect.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I knew this information, just that I am not sure how to construct a better hashcode.
@william007 Although you cannot make a perfect hash, it does not mean you cannot make a pretty good one. For example, the answer to the alleged duplicate (which isn't a duplicate, really) gives a reasonably good and fast way of hashing an array.
0

500 values form 0 to 50 means you can store the sum of all values multiplied by 50 and by position (starting from 0) also this can be reversed to extrapolate values

just check for size lenght and this has, and you should never find a collision

4 Comments

Values multiplied by position is far from collision-free: [0, 2, 1] and [0, 0, 2] give the same hash. Values multiplied by 50 to the Nth power is collision-free, but 50^500 is... larger than a long.
If you use a plain sum, all arrays with the same numbers in different order will produce the same hash code.
You are right, i've missed a piece of algorithm, but is still possible. Ill update my answer
fixed my response, but now i'm unsure this can be stored in a long or long long, have to do the math

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.