I am trying to use Berkeley DB to store a frequency table (i.e. hashtable with string keys and integer values). The table will be written, updated, and read from Python; so I am currently experimenting with bsddb3. This looks like it will do most of what I want, except it looks like it only supports string values?
If I understand correctly, Berkeley DB supports any kind of binary key and value. Is there a way to efficiently pass raw long integers in/out of Berkeley DB using bsddb3? I know I can convert the values to/from strings, and this is probably what I will end up doing, but is there a more efficient way? I.e. by storing 'raw' integers?
Background: I am currently working with a large (potentially tens, if not hundreds, of millions of keys) frequency table. This is currently implemented using a Python dictionary, but I abort the script when it starts to swap into virtual memory. Yes I looked at Redis, but this stores the entire database in memory. So I'm about to try Berkeley DB. I should be able to improve the creation efficiency by using short-term in-memory caching. I.e. create an in-memory Python dictionary, and then periodically add this to the master Berkeley DB frequency table.