0

I'm stumbling over a weird effect when initializing a Python class. Not sure if I'm overlooking something obvious or not.

First things first, I'm aware that apparently lists passed to classes are passed by reference while integers are passed by value as shown in this example:

class Test:
  def __init__(self,x,y):
    self.X = x
    self.Y = y
    self.X += 1
    self.Y.append(1)

x = 0
y = []
Test(x,y)
Test(x,y)
Test(x,y)
print x, y

Yielding the result:

0 [1, 1, 1]

So far so good. Now look at this example:

class DataSheet:
  MISSINGKEYS = {u'Item': ["Missing"]}

  def __init__(self,stuff,dataSheet):
    self.dataSheet = dataSheet
    if self.dataSheet.has_key(u'Item'):
      self.dataSheet[u'Item'].append(stuff[u'Item'])
    else:
      self.dataSheet[u'Item'] = self.MISSINGKEYS[u'Item']

Calling it like this

stuff = {u'Item':['Test']}
ds = {}
DataSheet(stuff,ds)
print ds
DataSheet(stuff,ds)
print ds
DataSheet(stuff,ds)
print ds

yields:

{u'Item': ['Missing']}
{u'Item': ['Missing', ['Test']]}
{u'Item': ['Missing', ['Test'], ['Test']]}

Now lets print MISSINGKEYS instead:

stuff = {u'Item':['Test']}
ds = {}
DataSheet(stuff,ds)
print DataSheet.MISSINGKEYS
DataSheet(stuff,ds)
print DataSheet.MISSINGKEYS
DataSheet(stuff,ds)
print DataSheet.MISSINGKEYS

This yields:

{u'Item': ['Missing']}
{u'Item': ['Missing', ['Test']]}
{u'Item': ['Missing', ['Test'], ['Test']]}

The exact same output. Why?

MISSINGKEYS is a class variable but at no point is it deliberately altered.

In the first call the class goes into this line:

self.dataSheet[u'Item'] = self.MISSINGKEYS[u'Item']

Which apparently starts it all. Obviously I only want self.dataSheet[u'Item'] to take the value of self.MISSINGKEYS[u'Item'], not to become a reference to it or something like that.

In the following two calls the line

self.dataSheet[u'Item'].append(stuff[u'Item'])

is called instead and the append works on self.dataSheet[u'Item'] AND on self.MISSINGKEYS[u'Item'] which it should not.

This leads to the assumption that after the first call both variables now reference the same object.

However although being equal they do not:

ds == DataSheet.MISSINGKEYS
Out[170]: True
ds is DataSheet.MISSINGKEYS
Out[171]: False

Can someone explain to me what is going on here and how I can avoid it?

EDIT: I tried this:

ds[u'Item'] is DataSheet.MISSINGKEYS[u'Item'] 
Out[172]: True

So okay, this one entry in both dictionaries references the same object. How can I just assign the value instead?

5
  • self.dataSheet[u'Item'] = self.MISSINGKEYS[u'Item'] creates a reference so when you change that you change it everywhere. You would need to create a copy self.dataSheet[u'Item'] = list(self.MISSINGKEYS[u'Item']) Commented Sep 23, 2016 at 10:31
  • Yeah. This is something about Python I don't understand, 99.9% of all times you never need to consciously create a copy of anything and everything works as if you'd do call-by-value, and then you stumble over something like this suddenly. Commented Sep 23, 2016 at 10:32
  • 1
    With a mutable object you are creating a reference to something that can be changed, only if the object is immutable will a new object be created if there is a change made so with i = 12; b = i; i += 12 b wil still be 12 as ints are immutable but with a mutable structure the change is done in place so no new object is created. Basically if you want to use a mutable value/obect and don't just want a reference, you need to copy or maybe deepcopy depending on the object. Commented Sep 23, 2016 at 10:34
  • Thanks, I think I get it now. Commented Sep 23, 2016 at 10:38
  • 1
    No worries, also you should be aware that even if the object itself is immutable, if it contains immutable objects then those objects can still be changed and the immutable object will still be the same object i.e t = (1,[2]);t=t2;t[1].append(2), that is where deepcopy comes in. Commented Sep 23, 2016 at 10:42

2 Answers 2

1

Here:

 else:
  self.dataSheet[u'Item'] = self.MISSINGKEYS[u'Item']

You are setting dataShee['Item'] with the list that is the value of MISSINGKEYS['Item']. The same list. Try

 else:
  self.dataSheet[u'Item'] = list(self.MISSINGKEYS[u'Item']) 

To make a copy.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, that helps.
1

Thinking about what happens in Python function calls in terms of "pass by reference" vs "pass by value" is not generally useful; some people like to use the term "pass by object". Remember, everything in Python is an object, so even when you pass an integer to a function (in C terminology) you're actually passing a pointer to that integer object.

In your first code block you do

self.X += 1

This doesn't modify the current integer object bound to self.X. It creates a new integer object with the appropriate value and binds that object to the self.X name.

Whereas, with

self.Y.append(1)

you are mutating the current list object that's bound to self.Y, which happens to be the list object that was passed to Test.__init__ as its y parameter. This is the same y list object in the calling code, so when you modify self.Y you are changing that y list object in the calling code. OTOH, if you did an assignment like

self.Y = ['new stuff']

then the name self.Y would be bound to the new list, and the old list (which is still bound to y in the calling code) would be unaffected.

You may find this article helpful: Facts and myths about Python names and values, which was written by SO veteran Ned Batchelder.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.