4

There is this problem I cannot get my head around to solve. So in order to create a dictionary, we have two ways: (1) a function call dict() and (2) literal syntax {}.

When I check the memory size of creation of two empty dictionaries, sys.getsizeof() return different memory size:

import sys

my_dict1 = {}
sys.getsizeof(my_dict1)
# 64

For syntax {}, by running this example you can see that the basic occupation of a Python dictionary is 64 bytes (8 buckets, each is 8 bytes, so 8 x 8 = 64). That makes sense to me. Since the {} syntax is handled in byte-code it can use this optimization mentioned here.

and

import sys

my_dict2 = dict()
sys.getsizeof(my_dict2)
# 232

But for dict(), it is 232 bytes. I think this huge difference is due to the different implementations. dict() is handled like a regular class constructor and Python uses the generic memory allocator, which does not follow an easily predictable pattern like the free list (again, this answer is great).

However, what I do not understand starts here, even if I start my empty dictionary using {} and begin to add elements to this empty dictionary, memory size jumps to 232 from 64 in the first iteration.

import sys

my_dict = {}
for i in range(20):
    my_dict[i] = 100
    print(f"elements = {i+1} size = {sys.getsizeof(my_dict)}")
    
# elements = 1 size = 232
# elements = 2 size = 232
# elements = 3 size = 232
# elements = 4 size = 232
# elements = 5 size = 232
# elements = 6 size = 360
# elements = 7 size = 360
# elements = 8 size = 360
# elements = 9 size = 360
# elements = 10 size = 360
# elements = 11 size = 640
# elements = 12 size = 640
# elements = 13 size = 640
# elements = 14 size = 640
# elements = 15 size = 640
# elements = 16 size = 640
# elements = 17 size = 640
# elements = 18 size = 640
# elements = 19 size = 640
# elements = 20 size = 640

In order to make Python hash table fast and reduce collisions, the interpreter keeps resizing the dictionary when it becomes full for two-third. That I know.

What I do not understand, why in the VERY FIRST insertion, the memory size goes from 64 to 232, even though empty dictionary started by {} is 64 bytes. Because when I create an empty dictionary using {}, I use special opcodes for the containers, instead of performing function calls.

EDIT: Python version I am using is 3.8.12.

19
  • 2
    I can't reproduce this in my Python 3.10.6 REPL. (Sorry for the formatting; this is a comment so I can't do much better.) >>> sys.getsizeof({}): 64; >>> sys.getsizeof(dict()): 64 Commented Sep 7, 2022 at 7:48
  • 1
    3.9.13 also shows this behavior, actually. So there we go, it's an implementation detail that has changed between 3.9 and 3.10... Commented Sep 7, 2022 at 7:52
  • 1
    Somebody call Raymond Hettinger for explanation. That's really an implementation detail. Commented Sep 7, 2022 at 7:53
  • 3
    As far as I'm concerned, the answer is because implementation details and how much does it really matter to userland code…? How deep do you want to dig here…? Commented Sep 7, 2022 at 7:56
  • 2
    As a stab into the dark, I'd guess the parser might allocate some minimal struct at parse time when it encounters {} in the source, whereas dict() is a runtime call which initialises the object differently, going through a more complete initialisation phase. Something like that. Maybe it's a purposeful optimisation, maybe it just happens to fall out of the implementation details. Commented Sep 7, 2022 at 8:31

1 Answer 1

4

On Python 3.8.12, the version you're on, dict() allocates a new "keys object" (the part of the dict that stores the hash table, the keys, and in non-"split" dicts, the values), while {} uses a common "keys object" shared by many empty dicts. The shared keys object isn't counted towards the size of the {} dict.

Note that as soon as you add entries to the {} dict, it will need to allocate a non-shared keys object for its own use, so using {} won't actually save any memory in most cases.

You can see the code for dict() by looking at dict_new and dict_init, the C functions corresponding to dict.__new__ and dict.__init__.

You can see the code for {} in the bytecode evaluation loop, under the BUILD_MAP opcode, and follow that down to _PyDict_NewPresized.

You can see the code that sys.getsizeof delegates to for a dict in dict_sizeof, the C implementation of dict.__sizeof__. The code that ignores the size of the shared keys object for {} is if (mp->ma_keys->dk_refcnt == 1), in _PyDict_SizeOf.

Sign up to request clarification or add additional context in comments.

4 Comments

Okay, now I see why {} and dict() giving different memory size. I am sorry but I still cannot see why initializing an empty object with {} and adding elements in a loop, when I insert the first element, it jumps from 64 to 232. Should not it still be 64 bytes until 5th element added?
For reference, d = {} calls BUILD_MAP with oparg = 0.
@ARAT: The dict can't share a keys object with other empty dicts if it's not empty any more.
(Well, technically, there are cases where empty and non-empty dicts can share keys objects, but those would be a few very restricted cases with instance __dict__s. The empty keys object used for {} can't be shared by nonempty dicts.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.