I am using a regular Python 3 dictionary to create a hashmap where both the keys and values are positive integers. The following code shows that a dict with about 6 million keys require 320 MB of memory.
import numpy as np
from sys import getsizeof
N = 10*1000*1000
a = np.random.randint(0, N, N)
b = np.random.randint(0, N, N)
d = dict(zip(a,b))
print('Number of elements:', len(d), 'Memory size (MB):', round(getsizeof(d)/2**20, 3))
print('Element memory size (B):', getsizeof(d[list(d.keys())[0]]))
# Number of elements: 6323010 Memory size (MB): 320.0
# Element memory size (B): 32
How can we create a more memory-efficient hashmap, ideally with O(1) lookup? The required hashmap can be immutable.
In my use case, the expected size of the hashmap can be up to 2 billion. Using Python dictionaries will require an estimated 64 GB of memory. Although this still fits into memory, we will still require some memory for other processes.