How to write (long) integer values to Berkeley DB using bsddb3?

Question

I am trying to use Berkeley DB to store a frequency table (i.e. hashtable with string keys and integer values). The table will be written, updated, and read from Python; so I am currently experimenting with bsddb3. This looks like it will do most of what I want, except it looks like it only supports string values?

If I understand correctly, Berkeley DB supports any kind of binary key and value. Is there a way to efficiently pass raw long integers in/out of Berkeley DB using bsddb3? I know I can convert the values to/from strings, and this is probably what I will end up doing, but is there a more efficient way? I.e. by storing 'raw' integers?

Background: I am currently working with a large (potentially tens, if not hundreds, of millions of keys) frequency table. This is currently implemented using a Python dictionary, but I abort the script when it starts to swap into virtual memory. Yes I looked at Redis, but this stores the entire database in memory. So I'm about to try Berkeley DB. I should be able to improve the creation efficiency by using short-term in-memory caching. I.e. create an in-memory Python dictionary, and then periodically add this to the master Berkeley DB frequency table.

Matt Anderson · Accepted Answer · 2012-05-02 14:28:49Z

1

Do you need to read the data back from a language other than python? If not, you can just use pickle on the python long integers, and unpickle them when you read them back in. You might be able to (probably be able to) use the shelve module, which would do this automatically for you. But even if not, you can manually pickle and unpickle the values.

>>> import cPickle as pickle
>>> pickle.dumps(19999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, pickle.HIGHEST_PROTOCOL)
'\x80\x02\x8a(\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x7fT\x97\x05p\x0b\x18J#\x9aA\xa5.{8=O,f\xfa\x81|\xa1\xef\xaa\xfd\xa2e\x02.'
>>> pickle.loads('\x80\x02\x8a(\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x7fT\x97\x05p\x0b\x18J#\x9aA\xa5.{8=O,f\xfa\x81|\xa1\xef\xaa\xfd\xa2e\x02.')
19999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999L

answered May 2, 2012 at 14:28

Matt Anderson

19.9k12 gold badges46 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

winwaed Over a year ago

Probably don't need another language. Although I do have a C# use-case in mind, it would be an import utility (ie. one off import rather than random access), so it could work with a Python-produced text dump(s). I assume pickle is going to be a little bit more efficient than string formatting?

winwaed Over a year ago

Looks to be running okay, and I've had the script running overnight. Looks like speed might be an issue - perhaps I need a bigger in-memory cache.

amirouche · Accepted Answer · 2015-08-21 20:09:34Z

0

Python struct to convert an integer to bytes in Python 3 or string in Python 2. Depending on your data you might use different packing format for unsigned long long or uint64_t :

struct.unpack('>Q', my_integer)

This will return the byte representation of my_integer on bigendian which match the lexicographical order required by bsddb key values. You can come with smarter packing function (have a look at wiredtiger.intpacking) to save a space.

You don't need a Python cache, use DBEnv.set_cache_max and set_cache.

answered Aug 21, 2015 at 20:09

amirouche

7,9417 gold badges42 silver badges100 bronze badges

Collectives™ on Stack Overflow

How to write (long) integer values to Berkeley DB using bsddb3?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related