17

An existing system written in Java uses the hashcode of a string as its routing strategy for load balancing.

Now, I cannot modify the system but need to generate strings that share the same hashcode to test the worst condition.

I provide those strings from commandline and hope the system will route all these strings into the same destination.

Is it possible to generate a large numbers of strings that share the same hashcode?

To make this question clear:

String[] getStringsInSameHashCode(int number){
    //return an array in length "number"
    //Every element of the array share the same hashcode. 
    //The element should be different from each other
}

Remarks: Any hashCode value is acceptable. There is no constraint on what the string is. But they should be different from each other.

EDIT: Override method of String class is not acceptable because I feed those string from command line.

Instrumentation is also not acceptable because that will make some impacts on the system.

4
  • using equals string is not an option ? Commented Oct 17, 2012 at 1:54
  • look at the String source code. Commented Oct 17, 2012 at 1:54
  • Do they need to be Strings with different values or just different String objects? Commented Oct 17, 2012 at 1:54
  • I know how java generate the hashcode for string, but have no ideas how to generate different string literal with same hashcode. I cannot override any method of string. @Code-Guru Commented Oct 17, 2012 at 2:02

6 Answers 6

37

see a test method, basically, so long as you match, a1*31+b1 = a2*31 +b2, which means (a1-a2)*31=b2-b1

public void testHash()
{
    System.out.println("A:" + ((int)'A'));
    System.out.println("B:" + ((int)'B'));
    System.out.println("a:" + ((int)'a'));

    System.out.println(hash("Aa".hashCode()));
    System.out.println(hash("BB".hashCode()));
    System.out.println(hash("Aa".hashCode()));
    System.out.println(hash("BB".hashCode()));


    System.out.println(hash("AaAa".hashCode()));
    System.out.println(hash("BBBB".hashCode()));
    System.out.println(hash("AaBB".hashCode()));
    System.out.println(hash("BBAa".hashCode()));

}

you will get

A:65
B:66
a:97
2260
2260
2260
2260
2019172
2019172
2019172
2019172

edit: someone said this is not straightforward enough. I added below part

    @Test
    public void testN() throws Exception {
        List<String> l = HashCUtil.generateN(3);
        for(int i = 0; i < l.size(); ++i){
            System.out.println(l.get(i) + "---" + l.get(i).hashCode());
        }
    }
AaAaAa---1952508096
AaAaBB---1952508096
AaBBAa---1952508096
AaBBBB---1952508096
BBAaAa---1952508096
BBAaBB---1952508096
BBBBAa---1952508096
BBBBBB---1952508096

below is the source code, it might be not efficient, but it work:

public class HashCUtil {

    private static String[] base = new String[] {"Aa", "BB"};

    public static List<String> generateN(int n)
    {
        if(n <= 0)
        {
            return null;
        }

        List<String> list = generateOne(null);
        for(int i = 1; i < n; ++i)
        {
            list = generateOne(list);
        }

        return list;
    }


    public static List<String> generateOne(List<String> strList)
    {   
        if((null == strList) || (0 == strList.size()))
        {
            strList = new ArrayList<String>();
            for(int i = 0; i < base.length; ++i)
            {
                strList.add(base[i]);
            }

            return strList;
        }

        List<String> result = new ArrayList<String>();

        for(int i = 0; i < base.length; ++i)
        {
            for(String str: strList)
            {   
                result.add(base[i]  + str);
            }
        }

        return result;      
    }
}

look at String.hashCode()

   public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }
Sign up to request clarification or add additional context in comments.

3 Comments

well, that's fine if this is SO's rule or culture to provide english only link... I just want to provide more to the author; while for the problem itself, i think i've explained enough using demo codes and some words here...
1) Yes it is. 2) The demo codes and words don't actually answer the question. The question is about how to generate collisions. An explanation of how/why the collisions occur is not relevant.
I think this is a very good answer, although the generated string is very long if N is very large.
9

I think find a equal-hash string from a long string is too hard, it's easy when find equal-hash string of an short string (2 or 3). Look at the equation below. (sorry I cant post image cause me new member)

Notice that, "FB" and "Ea" have the same hashcode, and any two strings like s1+"FB"+s2 and s1+"Ea"+s2 will have the same hashcode. So, the easy solution is finding any 2-char substring of existing string and replace with a 2-char substring with the same hashcode

Exmaple, we have the string "helloworld" get 2-char substring "he", hashcode("he") = 'h'*31 + 'e' = ('h'*31 + 31) + ('e' - 31) = ('h'+1)*31 + 'F' = 'i' + 'F' = hashcode("iF") so the desire string is "iFlloworld" we have increased 'h' by 1, we can increase by 2, or 3 etc (but will be wrong if it overflow the char value)

The below code run well with small level, it will wrong if the level is big, make the char value overflow, I will fix it later if you want (this code change 2 first chars, but I will edit code to 2 last chars because 2 first chars are calc with largest value)

    public static String samehash(String s, int level) {
    if (s.length() < 2)
        return s;
    String sub2 = s.substring(0, 2);
    char c0 = sub2.charAt(0);
    char c1 = sub2.charAt(1);
    c0 = (char) (c0 + level);
    c1 = (char) (c1 - 31 * level);
    String newsub2 = new String(new char[] { c0, c1 });
    String re =  newsub2 + s.substring(2);
    return re;
}

2 Comments

I just edit the question. We are towards the right direction I think. Thanks.
I think the best question is "write a reverse hashcode function"
4

I was wondering if there was a "universal" solution; e.g. some constant string XYZ, such that

    s.hashCode() == (s + XYZ).hashCode() 

for any string s. Finding such a string involves solving a fairly complicated equation ... which was beyond my rusty mathematical ability. But then it dawned on me that h == 31*h + ch is always true when h and ch are both zero!

Based on that insight, the following method should create a different String with the same hashcode as its argument:

    public String collider(String s) { 
        return "\0" + s;
    }

If NUL characters are problematic for you, prepending any string whose hashcode is zero would work too ... albeit that the colliding strings would be longer than if you used zero.

1 Comment

Let me have a try whether the \0 solution will work or not. Thanks.
3

Given String X, then String Y = "\u0096\0\0ɪ\0ˬ" + X will share same hashcode with X.

Explanation:

  1. String.hashcode() returns Integer, and every Integer X in java has property that X = X + 2 * (Integer.MAX_VALUE + 1). Here, Integer.MAX_VALUE = 2 ^ 31 - 1;

  2. So we only need to find String M, which has the property that M's hashcode % (2 * (Integer.MAX_VALUE + 1)) = 0;

  3. I find "\u0096\0\0ɪ\0ˬ" : \u0096 's ascii code is 150,\0 's ascii code is 0, ɪ's ascii code is 618, ˬ's ascii code is 748, so its hashcode is 150 * 31 ^ 5 + 618 * 31 ^ 2 + 748 = 2 ^ 32 = 0;

It is up to you which string you would like, and I pick this one.

1 Comment

I like your logic and proof, but prepending "\0" in this answer wins for simplicity.
1

You can instrument the java.lang.String class so its method hashCode() will always return the same number.

I suppose Javassist is the easiest way to do such an instrumentation.

In short:

  • obtain an instance of java.lang.instrument.Instrumentation by using a Java-agent (see package java.lang.instrument documentation for details)
  • redefine java.lang.String class by using Instrumentation.redefineClasses(ClassDefinition[]) method

The code will look like (roughly):

ClassPool classPool = new ClassPool(true);
CtClass stringClass = classPool.get("java.lang.String");
CtMethod hashCodeMethod = stringClass.getDeclaredMethod("hashCode", null);
hashCodeMethod.setBody("{return 0;}");
byte[] bytes = stringClass.toBytecode();
ClassDefinition[] classDefinitions = new ClassDefinition[] {new ClassDefinition(String.class, bytes);
instrumentation.redefineClasses(classDefinitions);// this instrumentation can be obtained via Java-agent

Also don't forget that agent manifest file must specify Can-Redefine-Classes: true to be able to use redefineClasses(ClassDefinition[]) method.

3 Comments

Thanks for your answer. Overriding the hashCode method is not acceptable since it will affect the system. The scenario is I need to test the system with those string literal. Modification on the system is definitely unacceptable.
@Jermaine Xu, this is not overriding, but instrumentation. However yes you do need an ability to relaunch JVM with the "existing system written in Java" and add an agent to JVM via command-line arguments. So if you can't do this, my suggestion is unusable. In this case the "hetaoblog"'s answer should fit your situation:)
Instrumentation is a good idea, but the objectives is testing, so I cannot modify redefine the hashCode method of String. Thanks for your instrumentation idea.
0
String s = "Some String"
for (int i = 0; i < SOME_VERY_BIG_NUMBER; ++i) {
    String copy = new String(s);

    // Do something with copy.
}

Will this work for you? It just creates a lot of copies of the same String literal that you can then use in your testing.

1 Comment

Sorry I didn't make it clear enough. Same string literal is not acceptable, because the string is the primary key in the database, I do need different string literals.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.