5

The program I've written stores a large amount of data in dictionaries. Specifically, I'm creating 1588 instances of a class, each of which contains 15 dictionaries with 1500 float to float mappings. This process has been using up the 2GB of memory on my laptop pretty quickly (I start writing to swap at about the 1000th instance of the class).

My question is, which of the following is using up my memory?

  • 34 million some pairs of floats?
  • The overhead of 22,500 dictionaries?
  • the overhead of 1500 classes?

To me it seems like the memory hog should be the huge number of floating point numbers that I'm holding in memory. However, If what I've read so far is correct, each of my floating point numbers take up 16 bytes. Since I have 34 million pairs, this should be about 108 million bytes, which should be just over a gigabyte.

Is there something I'm not taking into consideration here?

1 Answer 1

7

The floats do take up 16 bytes apiece, and a dict with 1500 entries about 100k:

>> sys.getsizeof(1.0)
16
>>> d = dict.fromkeys((float(i) for i in range(1500)), 2.0)
>>> sys.getsizeof(d)
98444

so the 22,500 dicts take over 2GB all by themselves, the 68 million floats another GB or so. Not sure how you compute 68 million times 16 equal only 100M -- you may have dropped a zero somewhere.

The class itself takes up a negligible amount, and 1500 instances thereof (net of the objects they refer to of course, just as getsizeof gives us such net amounts for the dicts) not much more than a smallish dict each, so, that's hardly the problem. I.e.:

>>> sys.getsizeof(Sic)
452
>>> sys.getsizeof(Sic())
32
>>> sys.getsizeof(Sic().__dict__)
524

452 for the class, (524 + 32) * 1550 = 862K for all the instances, as you see that's not the worry when you have gigabytes each in dicts and floats.

Sign up to request clarification or add additional context in comments.

9 Comments

If I plug in range(1000) or range(1250) instead of 1500 in your code above, I get a size of 24712 bytes. This seems to be true until the range(1365) at which point there is a discreet jump in the size. Is this true to how python allocates memory? If so, why is this the case?
I though a Python float was implemented using a C double, which would generally be 64-bit (8 byte). And the double type takes 8 bytes in the struct module. I can't argue with the result of the getsizeof though...
@Scott, every Python object has some overhead (reference counter, pointer to the type) in addition to its "payload", and all allocations are rounded up to multiple of 16 bytes to play nicely along with system malloc. To keep floats compactly, you can use an array.array of them (but that's list-like, not dict-like).
@Wilduck, dicts are hash tables so must always be overallocated to get good performance. As containers grow, the overallocation also grows (to amortize realloc time overhead and make it amortized O(1)).
Hmmm -- per the sources at svn.python.org/view/python/trunk/Objects/… and the notes at svn.python.org/view/python/trunk/Objects/… the increase should be *2, not *4, but I can reproduce the *4 measurement (not sure what's going on). If you can stay within the threshold you can definitely at least halve the memory footprint (maybe quarter it, but I can't assert that confidently without some long C-level debugging session to confirm what's happening).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.