How exactly do Python functions return/yield objects?

Question

I've been trying to track the memory usage of my Python code and been worrying about whether or not Python functions return by reference or by copying from function space to upper/global space.

e.g.

def f1():
    a = range(10000) # or np.arange(10000)
    return a

def f2():
    a = range(10000)
    for i in range(10):
        yield a[1000*i : 1000*(i+1)]

If I call b = f1(), and a is created within the function, then is b is assigned a reference to a's object, or is the object pointed by a copied and then referenced by b, and then a as well as a's object are deleted when the function call ends?

Likewise, if I perform the following,

for a_slice in f2():
    b = a_slice

is the yielded object also "created only once," or "copied over to global space"? And do NumPy arrays behave identically as Python lists?

This might help if you're used to C++: rg03.wordpress.com/2007/04/21/… — user2357112
– user2357112, Commented Jan 9, 2014 at 3:17
First, note that no result is yielded when you assign b=f(2), it only returns an iterator, which is an object with a method called next() which does the actual iteration and starts executing your code on all of the yields... — avenet
– avenet, Commented Jan 9, 2014 at 3:17
@avenet: Oh, sorry! let me change the code to update. I wrote the example a bit too quickly... — richizy
– richizy, Commented Jan 9, 2014 at 3:22

aIKid · Accepted Answer · 2014-01-09 03:36:13Z

3

First question, you can do a simple test:

>>> def f1():
    a = range(10) # or np.arange(10000)
    return a, id(a)

>>> b, id_b = f1()
>>> id(b) == id_b
True

So, b points to exactly the same object that a points. It's not copied.

We can do the same thing for the second case:

>>> def f2():
    a = range(100)
    for i in range(10):
        tmp = a[i:i**2]
        yield tmp, id(tmp)

>>> b = f2()
>>> for tmp, id_tmp in b:
        print id_tmp == id(tmp)


True
True
True
True
True
True
True
True
True
True

Seems like it works exactly the same way with the first case. Well, this is python. All the stuff we're doing is all about references :)

Hope this helps!

edited Jan 9, 2014 at 3:36

answered Jan 9, 2014 at 3:19

aIKid

28.5k5 gold badges41 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

steveha · Accepted Answer · 2014-01-09 03:36:23Z

If you evaluate the expression b = f1() then you are binding the name b with the list instance that was created inside f1() by the call to range().

Python is all references. The list object is created and a reference to the object is returned from the function, and then the reference is bound to the name b.

Your function f2() makes a generator that will first create a list instance with 10000 integers, and will bind the private variable name a with a reference to that list instance. Then, as you pull values from the iterator, each slicing operation in the loop will create a new list instance that will be yielded up. Once the loop completes and the last list instance has been yielded, the generator will be cleaned up, and at that time the list a will no longer be in use and will be garbage collected. (In CPython the garbage collection is based on reference counting and will work pretty promptly for this case. For other versions of Python such as Jython or PyPy, the garbage collection is much less predictable.)

I'm not a NumPy expert, but my understanding is that "views" (including slices) of array instances should take up very little memory. They don't make a copy of the original data. If you change f2() to build a numpy.array() instance with numpy.arange() and then yield up slices of it, I predict your program will use less memory. The current implementation of f2() creates and destroys 10 list instances, the slices of the list a; the NumPy array slices should avoid that.

I just tested the above:

import numpy as np
a = np.arange(100)
b = a[0:3]
b[0] = 99
assert a[0] == b[0]

In this example, b is a "view" into the array a. It doesn't allocate a new list or array, as proven by mutating the array by assigning to b[0]. The value at a[0] changes as well, because b is just another view of the same array.

(Anyone who is a NumPy expert, please point out if I have made any mistakes here. Thank you.)

Collectives™ on Stack Overflow

How exactly do Python functions return/yield objects?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related