memory efficient data structures in python

Question

I have a large number of identical dictionaries (identically structured: same keys, different values), which leads to two different memory problems:

dictionaries are expanded exponentially, so each dictionary could be using up to twice the memory it needs to.
dictionaries need to record their labels, so each dictionary is storing the keys for that dictionary, which is a significant amount of memory.

What is a good way that I can share the labels (so each label is not stored in the object), and compress the memory?

Let's start with the obvious question: why do you have a "large number of identical dictionaries"? If they're identical, why do you need more than one? — Two-Bit Alchemist
– Two-Bit Alchemist, Commented Jul 27, 2015 at 21:53
sorry, when I say identical dictionaries, I'm referring to the structure, not the contents. They all have the same keys, but different values. I'll update the post. — Andrew Spott
– Andrew Spott, Commented Jul 27, 2015 at 21:54
In this case, build only one dictionary where each item is a list. — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Jul 27, 2015 at 21:56
All the same, why do you need several different dictionaries as opposed to a dict where each key points to a list of all the values? To cut to the chase, I suspect an XY issue and I want to know what X led you to thinking about this Y. — Two-Bit Alchemist
– Two-Bit Alchemist, Commented Jul 27, 2015 at 21:57
@CasimiretHippolyte: That would be less than ideal. The removal of an element from the middle of the list (which would happen frequently) would cause problems. — Andrew Spott
– Andrew Spott, Commented Jul 27, 2015 at 21:57

intellimath · Accepted Answer · 2023-12-23 12:09:59Z

It may be offer the following solution to the problem based on the recordclass library:

pip install recordclass

>>> from recordclass import make_dataclass

For given set of labels you create a class:

>>> DataCls = make_dataclass('DataCls', 'first second third')
>>> data = DataCls(first="red", second="green", third="blue")
>>> print(data)
DataCls(first="red", second="green", third="blue")
>>> print('Memory size:', sys.getsizeof(data), 'bytes')
Memory size: 40 bytes

It fast and takes minimum memory. Suitable for creating millions of instances.

The downside: it's C-extension and not in standard library. But available on pypi.

Addition: Starting recordclass 0.15 version there is an option fast_new for faster instance creation:

>>> DataCls = make_dataclass('DataCls', 'first second third', fast_new=True)

If one don't need keyword arguments then instance creation will be accelerated twice. Starting 0.22 this is default behavior and option fast_new=Truecan be omitted.

P.S.: the author of the recordclass library is here.

Collectives™ on Stack Overflow

memory efficient data structures in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related