3

I do not know much about hashcodes. I found this code which prints the collisions.

Can you please tell me what are collisions and how to reduce it? Why should we use hashcodes?

public static int getHash(String str, int limit)
{
    int hashCode = Math.abs(str.hashCode()%(limit));
    return hashCode;
}

/**
 * @param args
 */
public static void main(String[] args)
{
    int hashLimit = 10000;
    int stringsLimit = 10000;
    String[] arr = new String[hashLimit];
    List<String> test = new ArrayList<String>();
    Random r = new Random(2);
    for ( int i = 0 ; i < stringsLimit ; i++ )
    {
        StringBuffer buf = new StringBuffer("");
        for ( int j = 0 ; j < 10 ; j++ )
        {
            char c = (char)(35+60*r.nextDouble());
            buf.append(c);
        }
        test.add(buf.toString());
        //System.out.println(buf.toString());
    }
    int collisions = 0;
    for ( String curStr : test )
    {
        int hashCode = getHash(curStr,hashLimit);
        if ( arr[hashCode] != null && !arr[hashCode].equals(curStr) )
        {
            System.out.println("collision of ["+arr[hashCode]+"] ("+arr[hashCode].hashCode()+" = "+hashCode+") with ["+curStr+"] ("+curStr.hashCode()+" = "+hashCode+")");
            collisions++;
        }
        else
        {
            arr[hashCode] = curStr;
        }
    }
    System.out.println("Collisions: "+collisions);
}
2
  • 1
    Regarding your 3 questions, the best answer for all 3 questions would be on wikipedia. Commented Mar 30, 2012 at 16:17
  • Take a look at en.wikipedia.org/wiki/Hash_table Commented Mar 30, 2012 at 18:28

3 Answers 3

23

Can you please tell me what are collisions and how to reduce it?

Collisions are when two non-equal objects have the same hash code. They're a fact of life - you need to deal with it.

Why should we use hashcodes?

Because they make it quick to look up values by key, basically. A hash table can use a hash code to very quickly get the set of possible key matches down to a very small set (often just one), at which point you need to check for actual key equality.

You should never assume that two hash codes being equal means the objects they were derived from are equal. Only the reverse is true: assuming a correct implementation, if two objects give different hash codes, then they are not equal.

Sign up to request clarification or add additional context in comments.

1 Comment

@HelloWorld: Well you're explicitly limiting the hash code to have at most 10,000 different values. That's a pretty poor starting point. It's not clear what you're trying to do, but if you're going to write your own kind of hash table, I'd do a bit more research on it first. You must be able to cope with collisions, and there are various different approaches for that.
3

To answer the other part of your question: To reduce the chance of collisions you should implement a hashing algorithm that provides an even distribution of hash codes over the set of possible inputs.

For example, supposing you implemented a naive hashCode() method for hashing MyString instances:

public class MyString {
  private final char[] arr;

  // Constructor and other methods.

  public int hashCode() {
    return arr.length == 0 ? 0 : (int) arr[0];
  }
}

In this example only the first character is used to create the hash code. Therefore, if you were to hash the strings: "apple", "anaconda", "anecdote" they would all produce the same hash value. A more efficient hash code would inspect all the letters in the character array to determine a hash code value, which would hopefully reduce the chance of a collision.

Comments

0

We have a "collision" if two different non-equal objects have the same hashcode. This may be a problem, for example when try to use both objects as keys in a Hashmap.

3 Comments

@HelloWorld improve the hashing function can reduce collisions, but usually the simplest thing to do is to use a larger array. i.e. a lower loading factor.
@HelloWorld, well, the code is artificially limiting it via the second argument to the getHash method. So you can reduce it by increasing (or eliminating) that limit.
@Kirk Woll thank you..I reduced the limit and the collisions reduced. But are there any other possible ways to do it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.