72

What does GetHashCode() calculate when invoked on the byte[] array? The 2 data arrays with equal content do not provide the same hash.

3
  • 16
    FYI: If you are on .NET4 ((IStructuralEquatable) myArray).GetHashCode(EqualityComparer<object>.Default) should give the same result for two arrays with same content. msdn.microsoft.com/en-us/library/…. Commented Aug 30, 2011 at 15:25
  • Related: why-does-.NET-not-implement-gethashcode-for-collections Commented Aug 9, 2014 at 11:32
  • 1
    @FuleSnabel Note that IStructuralEquatable requires use of the non-generic IEqualityComparer interface, which means that each byte in the array is going to be boxed during computation of the hashcode. Commented Nov 13, 2015 at 17:32

6 Answers 6

80

Arrays in .NET don't override Equals or GetHashCode, so the value you'll get is basically based on reference equality (i.e. the default implementation in Object) - for value equality you'll need to roll your own code (or find some from a third party). You may want to implement IEqualityComparer<byte[]> if you're trying to use byte arrays as keys in a dictionary etc.

EDIT: Here's a reusable array equality comparer which should be fine so long as the array element handles equality appropriately. Note that you mustn't mutate the array after using it as a key in a dictionary, otherwise you won't be able to find it again - even with the same reference.

using System;
using System.Collections.Generic;

public sealed class ArrayEqualityComparer<T> : IEqualityComparer<T[]>
{
    // You could make this a per-instance field with a constructor parameter
    private static readonly EqualityComparer<T> elementComparer
        = EqualityComparer<T>.Default;

    public bool Equals(T[] first, T[] second)
    {
        if (first == second)
        {
            return true;
        }
        if (first == null || second == null)
        {
            return false;
        }
        if (first.Length != second.Length)
        {
            return false;
        }
        for (int i = 0; i < first.Length; i++)
        {
            if (!elementComparer.Equals(first[i], second[i]))
            {
                return false;
            }
        }
        return true;
    }

    public int GetHashCode(T[] array)
    {
        unchecked
        {
            if (array == null)
            {
                return 0;
            }
            int hash = 17;
            foreach (T element in array)
            {
                hash = hash * 31 + elementComparer.GetHashCode(element);
            }
            return hash;
        }
    }
}

class Test
{
    static void Main()
    {
        byte[] x = { 1, 2, 3 };
        byte[] y = { 1, 2, 3 };
        byte[] z = { 4, 5, 6 };

        var comparer = new ArrayEqualityComparer<byte>();

        Console.WriteLine(comparer.GetHashCode(x));
        Console.WriteLine(comparer.GetHashCode(y));
        Console.WriteLine(comparer.GetHashCode(z));
        Console.WriteLine(comparer.Equals(x, y));
        Console.WriteLine(comparer.Equals(x, z));
    }
}
Sign up to request clarification or add additional context in comments.

6 Comments

@Chesnokov Yuriy: Okay, I've edited some code into my answer.
There seems to be some debate on whether GetHashCode should scan over the entire sequence. Interestingly, the internal implementation for Array.IStructuralEquatable.GetHashCode only considers the last eight items of an array, sacrificing hash uniqueness for speed.
I did something similar using Enumerable.SequenceEqual(). Is there a particular reason to hand-code the element comparison? (Admittedly it's probably a bit faster.)
@PeterA.Schneider: I don't think SequenceEqual is optimized to compare lengths first if the source implements appropriate interfaces.
@JonSkeet Since we have new primitives like Memory<T>, Span<T> or Sequence<T> can this code be optimised in any way? For example we do have SequenceEqual for ReadOnlySpan<T> now.
|
23

Like other non-primitive built-in types, it just returns something arbitrary. It definitely doesn't try to hash the contents of the array. See this answer.

Comments

16

Simple solution

public static int GetHashFromBytes(byte[] bytes)
{
    return new BigInteger(bytes).GetHashCode();
}

6 Comments

Seeing this solution made me smile. Clean, elegant. Digging deeper the hash implementation ends up calling github.com/microsoft/referencesource/blob/master/…
@XeorgeXeorge so?
@DaveJellison There is a (2^32) in 1 chance of collision, which is negalegible for most scenarios but is something that must be kept in mind whenever there's a hash code.
Agreed, but this is inherent with hashing as a rule. It's like going to the dictionary.com to complain about the definition of a word.
Note this method incurs a copy of the whole byte array, so may not be efficient. Also It's important to understand the purpose of GetHashCode() - it's not intended to produce a unique value but rather a well-distributed value for allocating buckets in a Dictionary or HashSet, which benefit from each bucket being roughly equal size. Both types use a combination of GetHashCode() and Equals() to determine whether a collision has really occurred.
|
13

byte[] inherits GetHashCode() from object, it doesn't override it. So what you get is basically object's implementation.

Comments

9

If you are using .NET 6 or at least .NET Core 2.1, you can write less code and achieve better performance with the System.HashCode struct.

Using the method HashCode.AddBytes() which is available from .NET 6:

public int GetHashCode(byte[] value)
{
    var hash = new HashCode();
    hash.AddBytes(value);
    return hash.ToHashCode();
}

Using the method HashCode.Add which is available from .NET Core 2.1:

public int GetHashCode(byte[] value) =>
    value.Aggregate(new HashCode(), (hash, i) => {
        hash.Add(i);
        return hash;
    }).ToHashCode();

Note that in the documentation of HashCode.AddBytes() it says:

This method does not guarantee that the result of adding a span of bytes will match the result of adding the same bytes individually.

In this sharplab demo both output the same result, but this might vary with .NET version or runtime environment.

Comments

0

If it's not the same instance, it will return different hashes. I'm guessing it is based on the memory address where it is stored somehow.

1 Comment

no, it is not the same instance, I presume in that case hashes would be equal

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.