2

I have a large number of identical dictionaries (identically structured: same keys, different values), which leads to two different memory problems:

  • dictionaries are expanded exponentially, so each dictionary could be using up to twice the memory it needs to.

  • dictionaries need to record their labels, so each dictionary is storing the keys for that dictionary, which is a significant amount of memory.

What is a good way that I can share the labels (so each label is not stored in the object), and compress the memory?

8
  • Let's start with the obvious question: why do you have a "large number of identical dictionaries"? If they're identical, why do you need more than one? Commented Jul 27, 2015 at 21:53
  • sorry, when I say identical dictionaries, I'm referring to the structure, not the contents. They all have the same keys, but different values. I'll update the post. Commented Jul 27, 2015 at 21:54
  • 2
    In this case, build only one dictionary where each item is a list. Commented Jul 27, 2015 at 21:56
  • 2
    All the same, why do you need several different dictionaries as opposed to a dict where each key points to a list of all the values? To cut to the chase, I suspect an XY issue and I want to know what X led you to thinking about this Y. Commented Jul 27, 2015 at 21:57
  • @CasimiretHippolyte: That would be less than ideal. The removal of an element from the middle of the list (which would happen frequently) would cause problems. Commented Jul 27, 2015 at 21:57

1 Answer 1

2

It may be offer the following solution to the problem based on the recordclass library:

pip install recordclass

>>> from recordclass import make_dataclass

For given set of labels you create a class:

>>> DataCls = make_dataclass('DataCls', 'first second third')
>>> data = DataCls(first="red", second="green", third="blue")
>>> print(data)
DataCls(first="red", second="green", third="blue")
>>> print('Memory size:', sys.getsizeof(data), 'bytes')
Memory size: 40 bytes

It fast and takes minimum memory. Suitable for creating millions of instances.

The downside: it's C-extension and not in standard library. But available on pypi.

Addition: Starting recordclass 0.15 version there is an option fast_new for faster instance creation:

>>> DataCls = make_dataclass('DataCls', 'first second third', fast_new=True)

If one don't need keyword arguments then instance creation will be accelerated twice. Starting 0.22 this is default behavior and option fast_new=Truecan be omitted.

P.S.: the author of the recordclass library is here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.