0

The following code was supposed to clarify how Python class variables behave,
but somehow it opens more questions than it solves.

The class Bodyguard has the variable protect, which is a list that by default contains the king.
The classes AnnoyingBodyguard and Bureaucrat change it.

Guards that protect specific people shall be called specific.   (bg_prime, bg_foreign, ...)
The others shall be called generic.   (bg1, bg2, bg3)

For specific guards the changes affect only those initialized after the change.
For generic guards the changes affect all of them, no matter when they were initialized.
Why the before/after difference for specific guards? Why the specific/generic difference?

These differences are somewhat surprising, but I find the following even stranger.
Given two lists a and b, one might think that these operations will always have the same result:
reassign: a = a + b     add-assign: a += b     append: for x in b: a.append(x)

Why do they cause completely different results when used in Bodyguard.__init__?

Only the results using reassign make any sense.
They can be seen below and in reassign_good.py.
The results for add-assign and append are quite useless, and I do not show them here.
But they can be seen in addassign_bad.py and append_bad.py.

class Bodyguard:
    protect = ['the king']

    def __init__(self, *args):
        if args:
            self.protect = self.protect + list(args)


##################################################################################


bg1 = Bodyguard()
bg_prime = Bodyguard('the prime minister')
bg_foobar = Bodyguard('the secretary of foo', 'the secretary of bar')


assert bg1.protect == ['the king']
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foobar.protect == [
    'the king', 'the secretary of foo', 'the secretary of bar'
]


##################################################################################


class AnnoyingBodyguard(Bodyguard):
    Bodyguard.protect = ['his majesty the king']


bg2 = Bodyguard()
bg_foreign = Bodyguard('the foreign minister')


# The king's title was updated for all generic guards.
assert bg1.protect == bg2.protect == ['his majesty the king']

# And for specific guards initialized after AnnoyingBodyguard was defined.
assert bg_foreign.protect == ['his majesty the king', 'the foreign minister']

# But not for specific guards initialized before AnnoyingBodyguard was defined.
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foobar.protect == [
    'the king', 'the secretary of foo', 'the secretary of bar'
]

##################################################################################


class Bureaucrat:
    def __init__(self, name):
        Bodyguard.protect.append(name)


malfoy = Bureaucrat('Malfoy')
bg3 = Bodyguard()
bg_paper = Bodyguard('the secretary of paperwork')


# Malfoy was added for all generic guards.
assert bg1.protect == bg2.protect == bg3.protect == [
    'his majesty the king', 'Malfoy'
]

# And for specific guards initialized after Malfoy:
assert bg_paper.protect == [
    'his majesty the king', 'Malfoy', 'the secretary of paperwork'
]

# But not for specific guards initialized before Malfoy:
assert bg_prime.protect == ['the king', 'the prime minister']
assert bg_foreign.protect == [
    'his majesty the king', 'the foreign minister'
]

Edit: Based on the comments and answers, I added the script reassign_better.py,
where the differences between generic and specific guards are removed.

The main class should look like this:

class Bodyguard:
    protect = ['the king']

    def __init__(self, *args):
        self.protect = self.protect[:]  # force reassign also for generic guards
        if args:
            self.protect = self.protect + list(args)
3
  • a = a + b produces a new list object. a += b and a.append(b) do an in-place modification to the existing object. If other names are bound to the same object as the original a, that link is broken after a = a + b. Commented Aug 2, 2023 at 20:07
  • "one might think that these operations will always have the same result" - locally, yes, but a = a + b creates a new list and a.append(x) changes the existing list in-place. Other parts of the code which reference the existing list will see a difference. Commented Aug 2, 2023 at 20:08
  • 2
    The thing to remember is that class attributes are shared by all instances of the class. So if you want a protect list that's specific to just one instance, you need to assign to self.protect to shadow Bodyguard.protect. Commented Aug 2, 2023 at 20:11

3 Answers 3

2

The difference is in how names are resolved on an object instance and how operators such as + and += are implemented. In

self.protect = self.protect + list(args)

Python performs the operation on the right hand side and assigns the result to the left hand side. First, self.protect is resolved. The instance self doesn't have a variable "protect" and by Python's scoping rules, its defining class is checked next. The class level Bodyguard.protect is found (its ['the king']) and that value is used.

The + operation on a list creates a new list, combining both sides. There are several rules for the + operator, but most commonly it calls the __add__ method on the left operator and takes the return value as the result of the operation. All classes are free to decide what __add__ means to them. Lists think it should be a new list with the contents of both sides.

Now you have an anonymous list and the assignment self.protect = <that anonymous list>. That's an assignment to the instance object. Interestingly, the next time you use self.protect, this list is found and there is no reason to fall back to BodyGuard.protect. That's the point of the code. It's a way to provide a default list.

Augmented addition is a bit different. Let's say you wrote

self.protect += list(args) 

instead. Python resolves self.protect the same way - its not on the instance object so you get the list in Bodyguard.protect. Instead of __add__, Python calls __iadd__ and once again the result is used for assignment. In this case, list decided that __iadd__ should append to the list and return that original list. When Python assigns that list to self.protect, its the updated list from Bodyguard.protect, which now has two references and the extra values.

Note:

AnnoyingBodyguard should define it own protect instead of overwriting its parent class:

class AnnoyingBodyguard(Bodyguard):
    protect = ['his majesty the king']

Now subclasses and instances of AnnoyingBodyguard get the more annoying protect list, but Bodyguard retains its original list. I mean, annoying is one thing, but changing protect on your parent class is downright sociopathic.

Sign up to request clarification or add additional context in comments.

Comments

1

Perhaps examples will clarify this. This is, in my view, the KEY point to understanding Python behind the scenes.

a = [1,2,3]
b = a
c = a

At this point, our program has exactly ONE list object. There happen to be three names bound to that one list. Modifying any of them modifies the one list, and will be visible everywhere:

b.append(4)
print(c)

Prints [1, 2, 3, 4]. However, if we do:

b = b + [5]
print(a)
print(b)

That creates a BRAND NEW list object and binds it to the name b. a and c are still bound to the original, so that prints

[1, 2, 3, 4]
[1, 2, 3, 4, 5]

The way I like to think about this is that there are two different "spaces" in Python: there is an object space, filled with thousands of anonymous objects that do not have a name, and there is a namespace, which contains names that are bound to objects. It's important to recognize this. Names do not have values. They are merely bound to objects. And this includes EVERY name: variables, functions, classes, modules, etc.

Note that this confusion does not actually require separate names. Take, for example, the very common error:

a = [[0] * 10] * 10

Many would think this creates 10 different lists. That's not so. This creates exactly TWO lists: one that contains 10 zeros, and one that contains 10 references to that list. So if you do:

a[5][5] = 7

that change is seen in all ten elements of a.

6 Comments

What makes this confusing is that types can overload what a += b does. By default it means a = a + b, which creates a new object and reassigns the variable. But list overloads this to mean a.extend(b), which modifies the object in place.
That's an excellent point, and unfortunately that is not "discoverable". You have to KNOW this for the types you are working on.
The comments and this answer surely make clear, why only reassign works (as opposed to add-assign or append). I am not sure if it answers also the generic vs. specific part of the question.
@Watchduck Because when you alter the generic (a.k.a. static) member Bodyguard.protect, it affects the initialization of all Bodybuard instances from that point on that reference that member when creating their own specific (a.k.a. instance) protect lists. Instances before that alteration have already created their own instance lists and are therefore unaffected by anything that changes the static member.
It's the same principle. Bodyguard.__init__ creates a new list and assigns it to self.protect. At that point, you lose the connection to Bodyguard.protect. The class variables are just a default, or a template.
|
1

This behavior is due to mutable and immutable objects in Python. The class variables a and b are lists, and lists are mutable (so they all share the same memory location). When an instance is created from a class, it creates a link to the class variable.

The examples of add-assign and append will modify the list in place, thus the change will be seen throughout all other instances of the class. Reassign will create a new list (and a new memory location).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.