1

I'm trying to understand how reference count work in python. I created a variable x and assigned a value of 10 to it. So basically x is pointing to the memory location where object of class int (10) is stored. Now when I try to get reference count of x and 10, I get two different reference counts. If x is pointing to the same memory location where 10 is stored then why do they have different reference counts?

>>> import sys
>>> sys.getrefcount(10)
12
>>> a = 10
>>> sys.getrefcount(10)
13
>>> sys.getrefcount(a)
11

1 Answer 1

2

When you directly call sys.getrefcount(10) the call itself increases the reference count. There's one reference for 10 at the call site, and at least one more for reasons I can't exactly recall.

More detailed answer: When you run a statement in the interactive prompt, that statement is compiled into bytecode, which is then exec'd by the interpreter. The bytecode is stored in a code object, which you can inspect by compiling a statement yourself with the compile() builtin:

>>> a = 10
>>> c = compile('sys.getrefcount(10)', '<stdin>', 'single')
>>> c
<code object <module> at 0x7f4def343270, file "<stdin>", line 1>

We can use the dis module to inspect the compiled bytecode:

>>> dis.dis(c)
  1           0 LOAD_NAME                0 (sys)
              2 LOAD_ATTR                1 (getrefcount)
              4 LOAD_CONST               0 (10)
              6 CALL_FUNCTION            1
              8 PRINT_EXPR
             10 LOAD_CONST               1 (None)
             12 RETURN_VALUE

You can see before CALL_FUNCTION is the byte code LOAD_CONST 10. But how does it know 10 is the constant to load? The actual bytecode instruction is LOAD_CONST(0) where 0 is an index into a table of constants which is stored in the code object:

>>> c.co_consts
(10, None)

So this is where one of the new references to 10 lives (temporarily).

Whereas if we do:

>>> c2 = compile('sys.getrefcount(a)', '<stdin>', 'single')
>>> dis.dis(c2)
  1           0 LOAD_NAME                0 (sys)
              2 LOAD_ATTR                1 (getrefcount)
              4 LOAD_NAME                2 (a)
              6 CALL_FUNCTION            1
              8 PRINT_EXPR
             10 LOAD_CONST               0 (None)
             12 RETURN_VALUE

Instead of LOAD_CONST there's just LOAD_NAME of whatever a happens to point to. The object 10 itself is not referenced anywhere in the code object.

Update: The source of the second reference is pretty obscure, but it comes from the AST parser which uses an Arena structure for efficient memory management of AST nodes and the like. The arena also maintains a list (as in an actual Python list) of Python objects parsed in the AST, in the case of numbers that happens here: https://github.com/python/cpython/blob/fee96422e6f0056561cf74fef2012cc066c9db86/Python/ast.c#L2144 (where PyArena_AddPyObject adds the object to said list). IIUC this list exists just to ensure that literals parsed from the AST have at least one reference held somewhere.

In the actual C code for compiling and running interactive statements the arena isn't freed until after the compiled statement has been executed, at which point that second extra reference goes away.

Sign up to request clarification or add additional context in comments.

8 Comments

So you mean to say that the actual reference count of 10 is 11 and it does +1 (because of function call) and +1 (due to some other reason). But then when I call getrefcount(a), it should do 11 + 1(function call) and return 12 right? why is it returning 11?
Because in the case of a the compiled bytecode is just loading it from a global variable. I'll follow-up with a more detailed answer to clarify this.
The extra reference you're missing is the one loaded onto the stack by LOAD_CONST.
Wait, no, that's not it - that reference exists, but the sys.getrefcount(a) call also loads such a reference, so it's not a difference between the two calls.
IIRC the tokenizer and compiler create some extra references, which live long enough to be picked up by the sys.getrefcount call if you directly run the statement interactively, but not if you run it most other ways you could run it. I never managed to track down where all those references were coming from, and the code has been changed a number of times. You'll see some interesting data structures if you look through the output for gc.get_referrers(10).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.