1

Basically my program reads a text file with the following format:

3
chairs
tables
refrigerators

The number on the first line indicates the number of items in the file to read.

Here's my hash function:

int hash(string& item, int n) {
    int hashVal = 0;
    int len = item.length();

    for(int i = 0; i < len; i++)
      hashVal = hashVal*37 + item[i];

    hashVal %= n;   

    if(hashVal < 0) hashVal += n;

    return hashVal;
}

when my program read the text file above, it was successful. But when I tried another one:

5
sabel
ziyarah
moustache
math
pedobear

The program would freeze. Not a segmentation fault or anything but it would just stop.

Any ideas?

Edit:

int n, tableSize;
myFile >> n;

tableSize = generateTableSize(n); 

string item, hashTable[tableSize];

for(int i = 0; i < tableSize; i++)
    hashTable[i] = "--";

while(myFile >> item && n!=0) {
    int index = hash(item,tableSize);

    if(hashTable[index] == "--")
        hashTable[index] = item;

    else {
        int newIndex = rehash(item,tableSize);
        while(hashTable[newIndex] != "--") {
            newIndex = rehash(item,tableSize);
        }
        hashTable[newIndex] = item;
    }
    n--;
}

int rehash(string item, int n)  {
    return hash(item,n+1);
}
8
  • 2
    Please include the code that calls hash(). Commented Apr 21, 2015 at 12:20
  • 4
    Please provide a MCVE. Commented Apr 21, 2015 at 12:21
  • There is nothing in this code that could "freeze", so there must be something else going on. Commented Apr 21, 2015 at 12:22
  • 1
    It's hard to tell you what is wrong with your read from file if you don't include how you read from the file. Commented Apr 21, 2015 at 12:22
  • 1
    probably because you never had to rehash in the first example. Commented Apr 21, 2015 at 12:33

1 Answer 1

4

The code freezes because it ends in an endless loop:

int index = hash(item,tableSize);

if(hashTable[index] == "--")
    hashTable[index] = item;
else {
    int newIndex = rehash(item,tableSize);
    while(hashTable[newIndex] != "--") {
        newIndex = rehash(item,tableSize);
    }
    hashTable[newIndex] = item;
}

You continuously recalculate the index, but do not change the input parameters, so the output stays the same, and therefore it is being recalculated again.

In the code above newIndex is calculated, based on the same inputs as index was calculated from using a different calculaton function though, so most likely it will have a different value than the first time, however the new index is also occupied. So we recalculate the newIndex again this time using the same function as before, with the exact same input, which gives the exact same output again. You look up the same index in the hash table, which is still the same value as the last time you did so, so you recalculate again, once again with the same input parameters, giving the same output, which you look up in the hashtable once again, etc.

The reason why you didn't see this with the first 3 lines, is that you did not have a collision (or at least only a single collisison, meaning the newIndex calculated from the rehash function was useful the first time).

The solution is not to increment the table size (since incrementing the table size, will at best lower the chance of collision which in it self can be good, but won't solve your problem entirely), but to either alter the inputs to your functions, so you get a different output, or change the hashtable structure.

I always found Sedgewick's book on algorithms in C++ useful, there is a chapter on hashing it.

Sadly I don't have my copy of Algorithms in C++ at hand, so I cannot tell you how Sedgewick solved it, but I would suggest for the simple educational purpose of solving your problem, starting by simply incrementing the index by 1 until you find a free slot in the hash table.

Sign up to request clarification or add additional context in comments.

4 Comments

if i increment my newIndex within the while loop, wouldn't that make my rehash function useless?
@JennyCalisay it would, so if you need a rehash function to work, you need some way of changing the input, to prevent the output of continuously giving you the same output. The reason one would want to use a rehash is to keep the O(1) performance provided by the hashing method, and not getting worst case O(n) which would otherwise be the case. But in most cases when a rehash function is used, it is only used once, and if it still does not give a valid index, the index is often simply incremented until a free space has been found.
Obviously there are a lot of different approaches to take though :)
i'm just curious, is there any way to "increment" the string item on my rehash function instead of my int n ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.