I'm sure the solution may be obvious, but this problem is driving me
mad. The following is my code:
class a(object):
mastervar = []
def __init__(self):
print 'called a'
class b(a):
def __init__(self):
print 'called b'
self.mapvar()
def mapvar(self):
self.mastervar.append(['b'])
class c(b):
def __init__(self):
print 'called c'
self.mapvar()
def mapvar(self):
super(c, self).mapvar()
self.mastervar.append(['c'])
if __name__ == '__main__':
a1 = a()
b1 = b()
c1 = c()
d1 = c() # Call C again
print a1.mastervar
print b1.mastervar
print c1.mastervar
print d1.mastervar
What I don't understand is why mastervar gets modified by each _seperate
instance_ of classes that happen to extend the base class 'a'.
Shouldn't mastervar be contained within the scope of the inheriting
classes? Why is it being treated like a global variable and being
modified by the other instances?
A variable declared in a class definition is shared by all instances of
the class. Python uses, essentially, a search path for names that goes
from most specific to least specific scope. In your example, a lookup in
c1 for mastervar will search for mastervar in:
c1.__dict__
c1.__class__.dict (ie b.__dict__)
a.__dict__
Note that "class variables" are not _copied_, they're just looked up in
the class object if not found in the instance. Remember, the class
declaration is only executed _once_ - then __init__ is executed for each
new instance. If you want a per-instance copy of a variable, you need to
generate a new copy in the __init__ method of the instance (or the
parent class and call the parent's __init__ with super() ).
If you actually assigned c1.mastervar, rather than modifying the dict,
you would get the per-instance dictionary you expected.
I strongly recommend a read of the book "Learning Python" if you want to
really _understand_ this stuff (and they do a much better job explaining
it than I ever could in my overcomplicated and incoherent way).
I think the Python behaviour is counter-intuitive to C++ and Java
programmers, but makes good sense in the context of the way Python
classes, inheritance, and namespaces work. It's also extremely
consistent with the way namespaces and inheritance work in the rest of
Python - there's no special magic for objects and classes. A good rule
might be "If you want Java-style instance variables, create instances
variables in __init__ not in the class declaration."
This might help explain things - ore might just confuse even more:
.... ina = ["in A"]
........ inb = ["in B"]
........ ina = ["in C"]
.... inc = ["in C"]
....
# examine the class dictionaries ....
A.__dict__.keys() ['ina', ...]
B.__dict__.keys() ['inb', ...]
C.__dict__.keys() ['ina', 'inc', ...]
# Now look up some variables to demonstrate the namespace
.... # search in class inheritance
....
A.ina ['in A']
B.ina ['in A']
C.ina # remember, we redefined this in C ['in C']
B.inb ['in B']
C.inc ['in C']
# This should help explain things ....
B.ina is A.ina # True, because B.ina just looks up A.ina True
C.ina is A.ina # False, because C.ina is found first False
# Now modify B.ina. Because asking for B.ina searches B, then A,
.... # for ina, we'll actually end up modifying A.ina
....
B.ina.append("blah")
A.ina ['in A', 'blah']
B.ina ['in A', 'blah']
# but if we do the same to C.ina, which is redefined in C,
.... # a.ina won't be modified
....
C.ina.append("change")
A.ina ['in A', 'blah']
C.ina ['in C', 'change']
# Now we're going to assign to B.ina, rebinding the name,
.... # instead of just modifying the existing mutable object.
....
B.ina = "fred"
B.ina "fred"
B.__dict__.keys() ['inb', 'ina']
# Note that there's now a new value for ina in B's dictionary. It
.... # is found by the search BEFORE looking for A.ina, and used. A.ina
.... # is not modified.
....['in A', 'blah']
What you can see happening here is the combination of a couple of
principles:
- Name lookups in classes happen as a search through the class
dictionary then all its parent classes' dictionaries for a name.
- Name assignments to a class are made directly in its dictionary.
- A modification of a mutable value is a lookup of a name followed
by the modification of an object, not an assignment to a name.
The principle is pretty similar for instances and classes - a variable
defined in a class is a single object shared between all instances of a
class, and if mutable can be modified by all of them. Example:
.... cvar = []
.... def __init__(self, name):
.... self.cvar.append(name)
.... print "I am ", name
....
a = D("fred") I am fred
b = D("jones") I am jones
D.cvar ['fred', 'jones']
a.cvar is D.cvar
True
If, on the other hand, I assign to that name - rather than modifying a
mutable object:
.... name = "fred"
.... def __init__(self, name):
.... self.name = name
....
a = E("smith")
b = E("anna")
E.name 'fred'
a.name 'smith'
b.name 'anna'
E.name is a.name False
a.__dict__.keys() ['name']
b.__dict__.keys() ['name']
E.__dict__.keys()
E.__dict__.keys()
['name', ...]
a new entry is inserted into each instances dictionary. The class
variable is still there, and unmodified, but the name search finds the
copy each instance has in its dict before the class one.
'smith'
See?
The handling of names on instances and subclasses is a lot like the way
names are handled in scopes in functions. Lookups scan upward from most
specific to least specific scope, ending at the global module scope,
while assignments bind a name into the local scope. Again, modifying a
mutable global works the same way as modifying a mutable member of a
class object.
I know it sounds darn complicated, but once you "get it" about namespace
searches, the difference between modifying an object and binding an
name, etc it's all very clean and nice and easy. I love the way
inheritance and name lookups work in Python - it's all 100% consistent.
The best way to learn it is play with it, though reading Learning Python
is strongly recommended IMHO.
You can use this behaviour to your _strong_ benefit, with
instance-caching classes (requires overriding __new__), classes that
automatically register and keep track of instances, and instances that
can all inherit a change made to the class object even after they're
instantiated. I've found instance caching, in particular, to be just
magic when performing lots of data processing.