30

I am slightly confused when I use the getsizeof method in the sys module for dictionaries. Below I have created a simple dictionary of two strings. The two strings' sizes are clearly larger than the one of the dictionary. The dictionary size is probably the dictionary overhead only, i.e., it doesn't take the actual data into account. What is the best way to figure out the memory-usage of the whole dictionary (keys, values, dictionary overhead)?

>>> first = 'abc'*1000
>>> second = 'def'*1000
>>> my_dictionary = {'first': first, 'second': second}
>>> getsizeof(first)
3021
>>> getsizeof(second)
3021
>>> getsizeof(my_dictionary)
140

4 Answers 4

17

From the PythonDocs

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

So it only counts the overhead, but you can use the function in this link to calculate it for containers like dicts.

Sign up to request clarification or add additional context in comments.

3 Comments

it is not working for simple dict like d = dict(a="4", b="4", c="4", d="4") it is skipping values corresponding to b,c,d
sorry i guess your code is correct, in above case python is reusing object "4"
The link in the answer is dead but a revised version of the recipe can be found here: code.activestate.com/recipes/…
9

The recursive getsizeof would get the actual size, but if you have multiple layers of dictionaries and only want to get a rough estimate. The json comes handy.

>>> first = 'abc'*1000
>>> second = 'def'*1000
>>> my_dictionary = {'first': first, 'second': second}
>>> getsizeof(first)
3049
>>> getsizeof(second)
3049
>>> getsizeof(my_dictionary)
288
>>> getsizeof(json.dumps(my_dictionary))
6076
>>> size = getsizeof(my_dictionary)
>>> size += sum(map(getsizeof, my_dictionary.values())) + sum(map(getsizeof, my_dictionary.keys()))
>>> size
6495

1 Comment

Definitely points for creativity, but it needs everything to be serializable, it is slower, and as you say it's an approximation...
4

Well, dictionaries don't store the actual string inside them, it works a bit like C/C++ pointers, so you only get a constant overhead in the dictionary for every element.

The total size is

size = getsizeof(d)
size += sum(map(getsizeof, d.itervalues())) + sum(map(getsizeof, d.iterkeys()))

1 Comment

To be pedantic, if any of the values is a container (rather than a scalar) it needs to drill down that container as well.
3

method: Serialise the dictionnary into a string, then get the size of the string.

I suggest to use 'dumps' from the pickle or from the json library. It serialise the dictionary into a string. Then you can get the size of the string. Like this:

getsizeof(pickle.dumps(my_dictionary)))

or

getsizeof(json.dumps(my_dictionary)))

If there are ndarray in the dictionary, use "pickle" because "json" can't process ndarray.

Here is you modified example:

from sys import getsizeof
import json
import pickle

first = 'abc'*1000
second = 'def'*1000
my_dictionary = {'first': first, 'second': second}

print('first:', getsizeof(first))
print('second',getsizeof(second))
print('dict_:', getsizeof(my_dictionary))

print('size of json dumps my_dictionary: ', getsizeof(json.dumps(my_dictionary)))
print('size of pickle dumps my_dictionary: ', getsizeof(pickle.dumps(my_dictionary)))

results:

first: 3049
second 3049
dict_: 232
size of json dumps my_dictionary:  6076
size of pickle dumps my_dictionary:  6078

1 Comment

You could even just use sys.getsizeof(str(my_dictionary)), which gives the same result as json.dumps

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.