Generate two different strings with the same hashcode

Question

I want to do some tests which require some strings with the same hash code, but not the same strings. I couldn't find any examples, so I decided to write a simple program to do it for me.

The code below generates two random strings over and over until they generate the same hash code.

    static Random r = new Random();
    static void Main(string[] args)
    {
        string str1, str2;
        do
        {
            str1 = GenerateString();
            str2 = GenerateString();
        } while (str1.GetHashCode() != str2.GetHashCode() && str1 != str2);

        Console.WriteLine("{0}\n{1}", str1, str2);
    }

    static string GenerateString()
    {
        string s = "";
        while (s.Length < 6)
        {
            s += (char)r.Next(char.MaxValue);
        }
        return s;
    }

This code seems to be working (theoretically), but it may take centuries to complete. So I was thinking of doing vice versa and generate two strings from one hash code.

I know it's not possible to retrieve a string from a hash code, but is it possible to generate possible strings from it?

I'm using Visual Studio 2015 Community Edition. Version: 14.0.23107.0D14REL.

.NET Framework: 4.6.00081.

If you want just for a test, you can create a class and override GetHashCode with a bad implementation that returns a fixed value, effectively producing collisions everywhere. — Alejandro
– Alejandro, Commented Aug 15, 2015 at 17:25
It takes just a few seconds, the Birthday Paradox makes it quick. "1241308".GetHashCode() == "699391".GetHashCode() in 32-bit code, "942".GetHashCode() == "9331582".GetHashCode() in 64-bit code. — Hans Passant
– Hans Passant, Commented Aug 15, 2015 at 17:34
Taking advantage of the birthday paradox is done by storing all of the strings you've generated so far. Each time you generate a new one you check it against all of the others. It increases the chances of finding a match enormously. If there are N hash codes then your method will have a 50% chance of finding a collision after checking N/2 strings. The birthday method will have a 50% chance after checking √N strings. For 32-bit hash codes where N=~4 billion, this is the difference between checking 2 billion strings or 65 thousand of them. — John Kugelman
– John Kugelman, Commented Aug 15, 2015 at 17:45
@ScottChamberlain: That's not true. The math only applies if the hash-values are approximately uniformly-random. If they are not, you can easily need many more hashes to find a collision (ex. hash(x) = x%INT_MAX, you'd iterate over INT_MAX values before finding a collision). — BlueRaja - Danny Pflughoeft
– BlueRaja - Danny Pflughoeft, Commented Aug 15, 2015 at 23:06
If you are using 64 bits, you can just use "\0Anything" and "\0Really, anything will work."- Why is String.GetHashCode() implemented differently in 32-bit and 64-bit versions of the CLR?. Of course, it is better to look for these strings for you current implementation - \0 will not always work. — Kobi
– Kobi, Commented Aug 16, 2015 at 5:16

Kobi · Accepted Answer · 2015-08-16 05:24:20Z

39

Finding two strings by repeatedly comparing random strings will take practically forever. Instead generate strings and store them in an a dictionary by hashcode. Then look up each. Match found pretty quickly.

MATCH FOUND!! xqzrbn and krumld hash code 80425224

void Main()
{

    var lookup = new Dictionary<int,string>();

    while(true) {
        var s = RandomString();        
        var h = s.GetHashCode();
        string s2;
        if (lookup.TryGetValue(h, out s2) && s2 != s) {
            Console.WriteLine("MATCH FOUND!! {0} and {1} hash code {2}",
                lookup[h],
                s,
                h);
            break;
        }
        lookup[h] = s;

        if (lookup.Count % 1000 == 0) {
            Console.WriteLine(lookup.Count);
        }
    }
}

static Random r = new Random();

// Define other methods and classes here
static string RandomString() {

    var s = ((char)r.Next((int)'a',((int)'z')+1)).ToString() +
            ((char)r.Next((int)'a',((int)'z')+1)).ToString() +
            ((char)r.Next((int)'a',((int)'z')+1)).ToString() +
            ((char)r.Next((int)'a',((int)'z')+1)).ToString() +
            ((char)r.Next((int)'a',((int)'z')+1)).ToString() +
            ((char)r.Next((int)'a',((int)'z')+1)).ToString();

    return s;
}

edited Aug 16, 2015 at 5:24

Kobi

139k41 gold badges259 silver badges302 bronze badges

answered Aug 15, 2015 at 17:42

Samuel Neff

75.3k18 gold badges144 silver badges186 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

M.kazem Akhgary Over a year ago

Thanks for the answer. @ScottChamberlain and SamuelNeff. it seems the method generating the string also affects the chances! more characters reduces the chance to get same hashcode.

Behrooz Over a year ago

@M.kazemAkhgary That's not true, You feel that way because the alpha method uses a lot more CPU power dus taking longer to find a match. the number of items should be the same.

Scott Chamberlain Over a year ago

@Behrooz The only way a dictionary can "crawl" is if you store lots of values that have the exact same HashCode but Equals( returns false. Using a int it is impossible to have (x.GetHashCode() == y.GetHashCode()) && (x.Equals(y) == false)

Scott Chamberlain Over a year ago

@Behrooz See this old answer of mine where I go in to the process in more detail than a comment will allow.

Samuel Neff Over a year ago

@Behrooz note O(1) does not mean fast, it means constant time. It can be constantly slow and still be O(1). That's not to say a dictionary is slow, just saying in general. Performance of a dictionary depends on the number of buckets and distribution of keys across buckets (randomness of hashcode).

|

Peter Mortensen · Accepted Answer · 2015-09-23 18:04:40Z

31

Take advantage of the Birthday Paradox. Instead of only testing two strings directly, test all strings you have seen before.

using System;
using System.Collections.Generic;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            var words = new Dictionary<int, string>();

            int i = 0;
            string teststring;
            while (true)
            {
                i++;
                teststring = i.ToString();
                try
                {
                    words.Add(teststring.GetHashCode(), teststring);
                }
                catch (Exception)
                {
                    break;
                }
            }

            var collisionHash = teststring.GetHashCode();
            Console.WriteLine("\"{0}\" and \"{1}\" have the same hash code {2}", words[collisionHash], teststring, collisionHash);
            Console.ReadLine();
        }
    }
}

For my machine it produces the output

"699391" and "1241308" have the same hash code -1612916492

almost instantly.

Due to how strings are hashed in .NET you may not get the exact same output as me, but it should be just as fast.

edited Sep 23, 2015 at 18:04

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Aug 15, 2015 at 17:41

Scott Chamberlain

128k37 gold badges299 silver badges447 bronze badges

2 Comments

Jeremiah Mercier Over a year ago

When running this on .NET Core, the hash code changes every time the application runs, so any hash codes posted in this question will be different than your application. This behavior is described here. If you need two strings with the same hash code for testing, as I did, you will need to generate them every time your application runs. Fortunately this takes less than 100 ms. Or you can create a new class with a bad implementation of GetHashCode, as Alejandro suggested.

Scott Chamberlain Over a year ago

Yes, that is why I put "Due to how strings are hashed in .NET you may not get the exact same output as me, but it should be just as fast."

Collectives™ on Stack Overflow

Generate two different strings with the same hashcode

2 Answers 2

12 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

12 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related