2

I have a String array that contains a lot of words. I wish to get the index of a word contained in the array (-1 if it is not contained).

I have first made a loop to search through all elements in the array while incrementing a variable and when I find it, I return the variable's value.

However the array can be very very very big so searching through all elements is extremely slow. I have decided that before adding a new word in my string array, I would use hashCode() % arrayLength to get the index of where I should put it. Then, to get the index back, I would just reuse hashCode() % arrayLength to instantly know at what index it is.

The problem is that sometimes there are "clashes", and two elements can have the same index in the array.

Anyone has an idea how to deal with it? Or any other alternatives to get the index of an element faster?

4
  • 4
    Are you trying to reimplement a HashMap? Commented Dec 17, 2018 at 14:50
  • Instead of placing your objects in a array. Why not place them in a HashMap<Integer, Object> where the key is the hash code. Commented Dec 17, 2018 at 14:54
  • @locus2k that won't deal with the problem of hash collisions. Commented Dec 17, 2018 at 15:09
  • @DodgyCodeException He isn't getting hash collision. He is getting the same index in the array of another item. He is using modulous operations to calculate the index. Even with different hashes, using modulous it is rather easy to come up with the same index. Commented Dec 17, 2018 at 16:50

2 Answers 2

3

You are trying to implement Open Addressing using an array. Unless this is a homework exercise, Java standard library already has classes to solve the search and collision problem.

You probably want to use a HashSet to check if the String exists. Behind the scene it's using HashMap which implements Separate Chaining to resolve conflicts.

String[] words = { "a" };
Set<String> set = new HashSet<>(Arrays.asList(words));
return set.contains("My Word") ? 1 : -1;
Sign up to request clarification or add additional context in comments.

Comments

0

The technique you are referring to is one of the implementations of hash tables in general. It is called Linear Probing which is a form of a general technique called Open Addressing. If you have calculated the index of the word based on the hashCode() % array.lengthand find a conflict (non-empty element or not the element you are looking for); then you have three ways to perform the conflict resolution:

Linear Search

This is done by incrementing the position and check if it is empty or has the element you are looking for. That is, your second position will be (hashCode(input) + 2) % array.length and then (hashCode(input) + 3) % array.length and so on. The problem with this approach is that your insertion or lookup performance will degrade to linear O(n) if the array is close to completely populated.

Quadratic Search

This is just an optimization to the above technique by jumping qudratically if you find a clash. So, your second index will be (hashCode(input) + 2*2) % array.length and then (hashCode(input) + 3*3) % array.length and so on which helps getting to the right location faster.

Double Hashing

This is even a more efficient approach to handle resolution by introducing another hashing function hashCode2() which you use in conjunction to the first one. In that case, your next search index will be (hashCode(input) + 2*hashCode2(input)) % array.length and then (hashCode(input) + 3*hashCode2(input)) % array.length and so on.

The more randomly distributed your jumps are, the better performance it gets over large hash tables

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.