Hash function for strings not working on some strings?

Question

Basically my program reads a text file with the following format:

3
chairs
tables
refrigerators

The number on the first line indicates the number of items in the file to read.

Here's my hash function:

int hash(string& item, int n) {
    int hashVal = 0;
    int len = item.length();

    for(int i = 0; i < len; i++)
      hashVal = hashVal*37 + item[i];

    hashVal %= n;   

    if(hashVal < 0) hashVal += n;

    return hashVal;
}

when my program read the text file above, it was successful. But when I tried another one:

5
sabel
ziyarah
moustache
math
pedobear

The program would freeze. Not a segmentation fault or anything but it would just stop.

Any ideas?

Edit:

int n, tableSize;
myFile >> n;

tableSize = generateTableSize(n); 

string item, hashTable[tableSize];

for(int i = 0; i < tableSize; i++)
    hashTable[i] = "--";

while(myFile >> item && n!=0) {
    int index = hash(item,tableSize);

    if(hashTable[index] == "--")
        hashTable[index] = item;

    else {
        int newIndex = rehash(item,tableSize);
        while(hashTable[newIndex] != "--") {
            newIndex = rehash(item,tableSize);
        }
        hashTable[newIndex] = item;
    }
    n--;
}

int rehash(string item, int n)  {
    return hash(item,n+1);
}

There is nothing in this code that could "freeze", so there must be something else going on. — Sergey Kalinichenko
– Sergey Kalinichenko, Commented Apr 21, 2015 at 12:22
It's hard to tell you what is wrong with your read from file if you don't include how you read from the file. — NathanOliver
– NathanOliver, Commented Apr 21, 2015 at 12:22
probably because you never had to rehash in the first example. — NathanOliver
– NathanOliver, Commented Apr 21, 2015 at 12:33

Tommy Andersen · Accepted Answer · 2015-04-21 13:14:46Z

4

The code freezes because it ends in an endless loop:

int index = hash(item,tableSize);

if(hashTable[index] == "--")
    hashTable[index] = item;
else {
    int newIndex = rehash(item,tableSize);
    while(hashTable[newIndex] != "--") {
        newIndex = rehash(item,tableSize);
    }
    hashTable[newIndex] = item;
}

You continuously recalculate the index, but do not change the input parameters, so the output stays the same, and therefore it is being recalculated again.

In the code above newIndex is calculated, based on the same inputs as index was calculated from using a different calculaton function though, so most likely it will have a different value than the first time, however the new index is also occupied. So we recalculate the newIndex again this time using the same function as before, with the exact same input, which gives the exact same output again. You look up the same index in the hash table, which is still the same value as the last time you did so, so you recalculate again, once again with the same input parameters, giving the same output, which you look up in the hashtable once again, etc.

The reason why you didn't see this with the first 3 lines, is that you did not have a collision (or at least only a single collisison, meaning the newIndex calculated from the rehash function was useful the first time).

The solution is not to increment the table size (since incrementing the table size, will at best lower the chance of collision which in it self can be good, but won't solve your problem entirely), but to either alter the inputs to your functions, so you get a different output, or change the hashtable structure.

I always found Sedgewick's book on algorithms in C++ useful, there is a chapter on hashing it.

Sadly I don't have my copy of Algorithms in C++ at hand, so I cannot tell you how Sedgewick solved it, but I would suggest for the simple educational purpose of solving your problem, starting by simply incrementing the index by 1 until you find a free slot in the hash table.

edited Apr 21, 2015 at 13:14

answered Apr 21, 2015 at 12:32

Tommy Andersen

7,3101 gold badge35 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jenny Calisay Over a year ago

if i increment my newIndex within the while loop, wouldn't that make my rehash function useless?

Tommy Andersen Over a year ago

@JennyCalisay it would, so if you need a rehash function to work, you need some way of changing the input, to prevent the output of continuously giving you the same output. The reason one would want to use a rehash is to keep the O(1) performance provided by the hashing method, and not getting worst case O(n) which would otherwise be the case. But in most cases when a rehash function is used, it is only used once, and if it still does not give a valid index, the index is often simply incremented until a free space has been found.

Tommy Andersen Over a year ago

Obviously there are a lot of different approaches to take though :)

Jenny Calisay Over a year ago

i'm just curious, is there any way to "increment" the string item on my rehash function instead of my int n ?

Collectives™ on Stack Overflow

Hash function for strings not working on some strings?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related