Python memory usage: Which of my objects is hogging the most memory?

Question

The program I've written stores a large amount of data in dictionaries. Specifically, I'm creating 1588 instances of a class, each of which contains 15 dictionaries with 1500 float to float mappings. This process has been using up the 2GB of memory on my laptop pretty quickly (I start writing to swap at about the 1000th instance of the class).

My question is, which of the following is using up my memory?

34 million some pairs of floats?
The overhead of 22,500 dictionaries?
the overhead of 1500 classes?

To me it seems like the memory hog should be the huge number of floating point numbers that I'm holding in memory. However, If what I've read so far is correct, each of my floating point numbers take up 16 bytes. Since I have 34 million pairs, this should be about 108 million bytes, which should be just over a gigabyte.

Is there something I'm not taking into consideration here?

Alex Martelli · Accepted Answer · 2010-06-21 16:48:14Z

7

The floats do take up 16 bytes apiece, and a dict with 1500 entries about 100k:

>> sys.getsizeof(1.0)
16
>>> d = dict.fromkeys((float(i) for i in range(1500)), 2.0)
>>> sys.getsizeof(d)
98444

so the 22,500 dicts take over 2GB all by themselves, the 68 million floats another GB or so. Not sure how you compute 68 million times 16 equal only 100M -- you may have dropped a zero somewhere.

The class itself takes up a negligible amount, and 1500 instances thereof (net of the objects they refer to of course, just as getsizeof gives us such net amounts for the dicts) not much more than a smallish dict each, so, that's hardly the problem. I.e.:

>>> sys.getsizeof(Sic)
452
>>> sys.getsizeof(Sic())
32
>>> sys.getsizeof(Sic().__dict__)
524

452 for the class, (524 + 32) * 1550 = 862K for all the instances, as you see that's not the worry when you have gigabytes each in dicts and floats.

edited Jun 21, 2010 at 16:48

answered Jun 21, 2010 at 16:42

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Wilduck Over a year ago

If I plug in range(1000) or range(1250) instead of 1500 in your code above, I get a size of 24712 bytes. This seems to be true until the range(1365) at which point there is a discreet jump in the size. Is this true to how python allocates memory? If so, why is this the case?

Scott Griffiths Over a year ago

I though a Python float was implemented using a C double, which would generally be 64-bit (8 byte). And the double type takes 8 bytes in the struct module. I can't argue with the result of the getsizeof though...

Alex Martelli Over a year ago

@Scott, every Python object has some overhead (reference counter, pointer to the type) in addition to its "payload", and all allocations are rounded up to multiple of 16 bytes to play nicely along with system malloc. To keep floats compactly, you can use an array.array of them (but that's list-like, not dict-like).

Alex Martelli Over a year ago

@Wilduck, dicts are hash tables so must always be overallocated to get good performance. As containers grow, the overallocation also grows (to amortize realloc time overhead and make it amortized O(1)).

Alex Martelli Over a year ago

Hmmm -- per the sources at svn.python.org/view/python/trunk/Objects/… and the notes at svn.python.org/view/python/trunk/Objects/… the increase should be *2, not *4, but I can reproduce the *4 measurement (not sure what's going on). If you can stay within the threshold you can definitely at least halve the memory footprint (maybe quarter it, but I can't assert that confidently without some long C-level debugging session to confirm what's happening).

|

Collectives™ on Stack Overflow

Python memory usage: Which of my objects is hogging the most memory?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related