2

Background information:

In my project I'm applying Reinforcement Learning (RL) to the Mario domain. For my state representation I chose to use a hashtable with custom objects as keys. My custom objects are immutable and have overwritten the .equals() and the .hashcode() (which were generated by the IntelliJ IDE).

This is the resulting .hashcode(), I've added the possible values in comments as extra information:

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0);                // 2 possible values: 0, 1
    result = 31 * result + (facing ? 1 : 0);     // 2 possible values: 0, 1 
    result = 31 * result + marioMode;            // 3 possible values: 0, 1, 2
    result = 31 * result + (onGround ? 1 : 0);   // 2 possible values: 0, 1 
    result = 31 * result + (canJump ? 1 : 0);    // 2 possible values: 0, 1 
    result = 31 * result + (wallNear ? 1 : 0);   // 2 possible values: 0, 1 
    result = 31 * result + nearestEnemyX;        // 33 possible values: - 16 to 16
    result = 31 * result + nearestEnemyY;        // 33 possible values: - 16 to 16

    return result;
}

The Problem:

The problem here is that the result in the above code can exceed Integer.MAX_VALUE. I've read online this doesn't have to be a problem, but in my case it is. This is partly due to algorithm used which is Q-Learning (an RL method) and depends on the correct Q-values stored inside the hashtable. Basically I cannot have conflicts when retrieving values. When running my experiments I see that the results are not good at all and I'm 95% certain the problem lies with the retrieval of the Q-values from the hashtable. (If needed I can expand on why I'm certain about this, but this requires some extra information on the project which isn't relevant for the question.)

The Question:

Is there a way to avoid the integer overflow, maybe I'm overlooking something here? Or is there another way (perhaps another datastructure) to get reasonably fast the values given my custom-key?

Remark:

After reading some comments I do realise that my choice for using a HashTable wasn't maybe the best one as I want unique keys that do not cause collisions. If I still want to use the HashTable I will probably need a proper encoding.

12
  • 1
    If you have such a special use of hashes, why do you use Object's .hashCode() to begin with? Why not use a dedicated hash? You can for instance use Guava's HashFunctions Commented May 13, 2014 at 14:23
  • @fge That's Object's hashCode()? I thought the normal implementation of hashCode() was a native method? Commented May 13, 2014 at 14:24
  • @user3580294 if this class is immutable, then the code above will generate a non-changing hash code. Anyway, that's not the main point in the question. Commented May 13, 2014 at 14:25
  • 1
    @user3580294 it looks to me like the OP wants to generate unique keys for each object or something; the problem, of course, is that hashCode() makes no guarantees about that Commented May 13, 2014 at 14:28
  • 1
    Floris: HashMaps are already pretty darn fast, but the Java implementation does have some behind-the-scenes stuff like extra hashing that might slow things down. Also, it probably would have helped to say you needed perfect hashes instead of worrying about the range of numbers returned by hashCode(). If you're using the objects as keys into a HashMap the range of values returned by hashCode() doesn't matter so long as the method obeys the contract for hashCode() Commented May 13, 2014 at 14:37

4 Answers 4

8

You need a dedicated Key Field to guarantee uniqueness

.hashCode() isn't designed for what you are using it for

.hashCode() is designed to give good general results in bucketing algorithms, which can tolerate minor collisions. It is not designed to provide a unique key. The default algorithm is a trade off of time and space and minor collisions, it isn't supposed to guarantee uniqueness.

Perfect Hash

What you need to implement is a perfect hash or some other unique key based on the contents of the object. This is possible within the boundries of an int but I wouldn't use .hashCode() for this representation. I would use an explicit key field on the object.

Unique Hashing

One way to use use SHA1 hashing that is built into the standard library which has an extremely low chance of collisions for small data sets. You don't have a huge combinational explosion in the values you posts to SHA1 will work.

You should be able to calculate a way to generate a minimal perfect hash with the limited values that you are showing in your question.

A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers—usually [0..n−1] or [1..n]. A more formal way of expressing this is: Let j and k be elements of some finite set K. F is a minimal perfect hash function iff F(j) =F(k) implies j=k (injectivity) and there exists an integer a such that the range of F is a..a+|K|−1. It has been proved that a general purpose minimal perfect hash scheme requires at least 1.44 bits/key.2 The best currently known minimal perfect hashing schemes use around 2.6 bits/key.[3]

A minimal perfect hash function F is order preserving if keys are given in some order a1, a2, ..., an and for any keys aj and ak, j

A minimal perfect hash function F is monotone if it preserves the lexicographical order of the keys. In this case, the function value is just the position of each key in the sorted ordering of all of the keys. If the keys to be hashed are themselves stored in a sorted array, it is possible to store a small number of additional bits per key in a data structure that can be used to compute hash values quickly.[6]

Solution

Note where it talks about a URL it can be any byte[] representation of any String that you calculate from your object.

I usually override the toString() method to make it generate something unique, and then feed that into the UUID.nameUUIDFromBytes() method.

Type 3 UUID can be just as useful as well UUID.nameUUIDFromBytes()

Version 3 UUIDs use a scheme deriving a UUID via MD5 from a URL, a fully qualified domain name, an object identifier, a distinguished name (DN as used in Lightweight Directory Access Protocol), or on names in unspecified namespaces. Version 3 UUIDs have the form xxxxxxxx-xxxx-3xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y is one of 8, 9, A, or B.

To determine the version 3 UUID of a given name, the UUID of the namespace (e.g., 6ba7b810-9dad-11d1-80b4-00c04fd430c8 for a domain) is transformed to a string of bytes corresponding to its hexadecimal digits, concatenated with the input name, hashed with MD5 yielding 128 bits. Six bits are replaced by fixed values, four of these bits indicate the version, 0011 for version 3. Finally, the fixed hash is transformed back into the hexadecimal form with hyphens separating the parts relevant in other UUID versions.

My preferred solution is Type 5 UUID ( SHA version of Type 3)

Version 5 UUIDs use a scheme with SHA-1 hashing; otherwise it is the same idea as in version 3. RFC 4122 states that version 5 is preferred over version 3 name based UUIDs, as MD5's security has been compromised. Note that the 160 bit SHA-1 hash is truncated to 128 bits to make the length work out. An erratum addresses the example in appendix B of RFC 4122.

Key objects should be immutable

That way you can calculate toString(), .hashCode() and generate a unique primary key inside the Constructor and set them once and not calculate them over and over.

Here is a straw man example of an idiomatic immutable object and calculating a unique key based on the contents of the object.

package com.stackoverflow;

import javax.annotation.Nonnull;
import java.util.Date;
import java.util.UUID;

public class Q23633894
{

    public static class Person
    {
        private final String firstName;
        private final String lastName;
        private final Date birthday;
        private final UUID key;
        private final String strRep;

        public Person(@Nonnull final String firstName, @Nonnull final String lastName, @Nonnull final Date birthday)
        {
            this.firstName = firstName;
            this.lastName = lastName;
            this.birthday = birthday;
            this.strRep = String.format("%s%s%d", firstName, lastName, birthday.getTime());
            this.key = UUID.nameUUIDFromBytes(this.strRep.getBytes());
        }

        @Nonnull
        public UUID getKey()
        {
            return this.key;
        }

        // Other getter/setters omitted for brevity

        @Override
        @Nonnull
        public String toString()
        {
            return this.strRep;
        }

        @Override
        public boolean equals(final Object o)
        {
            if (this == o) { return true; }
            if (o == null || getClass() != o.getClass()) { return false; }
            final Person person = (Person) o;
            return key.equals(person.key);
        }

        @Override
        public int hashCode()
        {
            return key.hashCode();
        }
    }
}
Sign up to request clarification or add additional context in comments.

8 Comments

There's no such a thing like "perfect hash", because a collision may occur anyway. By the way, you'll really need to hack the things to get the same hashsum
you can create a perfect hash for anything given enough bits to cover the set of possible values
@JarrodRoberson Are you even sure OP's looking for a perfect hash? He said he's using a hash table, which shouldn't care about the hash values that much unless he's rolling his own
@JarodRoberson that's truth in the mathematical sense, but hashCode() method returns an integer, which is bounded.
@JarrodRoberson I accepted your answer because of how informative it is, but in the the answer of Victor-Philipp Negoescu and Peter Lawrey also helped me.
|
6

For a unique representation of your object's state, you would need 19 bits in total. Thus, it is possible to represent it by a "perfect hash" integer value (which can have up to 32 bits):

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0); // needs 1 bit (2 possible values)
    result += (facing ? 1 : 0) << 1; // needs 1 bit (2 possible values)
    result += marioMode << 2; // needs 2 bits (3 possible values)
    result += (onGround ? 1 : 0) << 4; // needs 1 bit (2 possible values)
    result += (canJump ? 1 : 0) << 5; // needs 1 bit (2 possible values)
    result += (wallNear ? 1 : 0) << 6; // needs 1 bit (2 possible values)
    result += (nearestEnemyX + 16) << 7; // needs 6 bits (33 possible values)
    result += (nearestEnemyY + 16) << 13; // needs 6 bits (33 possible values)
}

10 Comments

Might it be even better to calculate the hashCode() once and cache it because the object is immutable?
Of course, but that was not the question. :-) Ok, it was part of the question. But calculation of such a hash code is somewhat simple. You would need to store 2^19 integers (all possible hash codes) which would need at least 2^19 * 32 bit = 2 MB.
Yeah, true. Just thought it might help
@user3580294 to be honest the whole operation is going to take 10 ns or so on a recent cpu so it is unlikely that this is where the program is going to spend some time.
@assylias The math might be fast, true. Depends on how frequently the map is accessed.
|
2

Instead of using 31 as a your magic number, you need to use the number of possibilities (normalised to 0)

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0);                // 2 possible values: 0, 1
    result = 2 * result + (facing ? 1 : 0);      // 2 possible values: 0, 1 
    result = 3 * result + marioMode;             // 3 possible values: 0, 1, 2
    result = 2 * result + (onGround ? 1 : 0);    // 2 possible values: 0, 1 
    result = 2 * result + (canJump ? 1 : 0);     // 2 possible values: 0, 1 
    result = 2 * result + (wallNear ? 1 : 0);    // 2 possible values: 0, 1 
    result = 33 * result + (16 + nearestEnemyX); // 33 possible values: - 16 to 16
    result = 33 * result + (16 + nearestEnemyY); // 33 possible values: - 16 to 16

    return result;
}

This will give you 104544 possible hashCodes() BTW you can reverse this process to get the original values from the code by using a series of / and %

1 Comment

Thanks for the top on my nearestEnemyX & nearestEnemyY by adding the 16 to make the numbers positive. This helped me as well, just as the accepted answer. Unfortunately I can only accept one.
-1

Try Guava's hashCode() method or JDK7's Objects.hash(). It's way better than writing your own. Don't repeat code yourself (and anyone else when you can use out of box solution):

3 Comments

java.util doesn't have a hashCode() method... And Object#hashCode() only works for reference equality, not object equality
@user3375746 What you propose won't solve the issue of getting negative integers.
@assylias I don't think that the negative integers are really the issue here (negative integers are valid hashCodes AFAIK) but the uniqueness which wouldn't be solved by this answer either.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.