I've been studying data structures in python and I created a simple implementation of a dictionary which is below. While this implementation is in the end useless (could just use hash() instead of creating a hash function etc.), I'm interested in the nitty gritty of how it all fits together.
This implementation chooses a starting size of 11. self.capacity keeps track of the number of free slots remaining. When a (key, value) pair is added, it decreases by one, and once it hits 0, it triggers a new slot to be created every time one is needed.
My question is this: the hashvalues that are being computed from the hash function depend on len(self.slots), but this value is constantly increasing when I add more space to my dictionary. Instead of using len(self.slots) to calculate the hashfunction, I tried just using the initial size (11), but as soon as the dictionary attempts to add the 12th (key,value) pair, the program seems to get stuck. This seems to suggest that the hash function needs to be based on the size of the table, and that in order to keep adding elements I need to be able to increase the size of the table. This leads me to the following questions.
- Do dictionaries have to be initialized to a fixed length and remain that length? If not, what is the preferred way of incrementing the length as elements are added?
- If the length of the dictionary can change, should the hash function be constructed without using the dictionary size? How could I ensure that values would still reduce to those in the range of table slots without reducing via modulo table size?
Any explanations, interesting insights or useful tidbits would be greatly appreciated.
#
class HashTable:
def __init__(self):
self.size = 11
self.capacity = self.size
self.slots = [None] * self.size
self.data = [None] * self.size
def hashfunction(self, key, size):
return key%size
def rehash(self, oldhash, size):
return (oldhash+1)%size
def put(self, key, value):
hashvalue = self.hashfunction(key,len(self.slots))
if self.capacity < 1:
self.slots += [None]
self.data += [None]
self.capacity += 1
if self.slots[hashvalue] == None:
self.slots[hashvalue] = key
self.data[hashvalue] = value
self.capacity -= 1
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data
else:
rehashed = self.rehash(hashvalue, len(self.slots))
while self.slots[rehashed] != None and self.slots[rehashed] != key:
rehashed = self.rehash(rehashed, len(self.slots))
if self.slots[rehashed] == None:
self.slots[rehashed] = key
self.data[rehashed] = value
self.capacity -= 1
else:
self.data[rehashed] = value
def get(self, key):
startslot = self.hashfunction(key, len(self.slots))
data = None
found = False
stop = False
position = startslot
while self.slots[position] != None and not found and not stop:
if self.slots[position] == key:
data = self.data[key]
found = True
else:
position = self.rehash(position, len(self.slots))
if position == startslot:
stop = True
return data
def __delitem__(self, key):
hashvalue = self.hashfunction(key, len(self.slots))
if self.slots[hashvalue] == key:
self.slots[hashvalue] = None
self.data[hashvalue] = None
else:
rehashed = self.hashfunction(hashvalue, len(self.slots))
while self.slots[rehashed] != key:
rehashed = self.hashfunction(rehashed, len(self.slots))
if self.slots[rehashed] == key:
self.slots[rehashed] == None
self.data[rehashed] == None
def __contains__(self, key):
return key in self.slots
def __getitem__(self, key):
return self.get(key)
def __setitem__(self, key, value):
self.put(key, value)