constructin trees in python

M

Maxim Mercury

Hi ,
Iam very much new to python. Iam trying to construct a xml dom tree
using the builtin HTMLParser class (on data event callbacks). Iam
maintaining the childs as a list of elements and whenver the sax
parser encounters a tag i push it to a local stack, my basic logic is
below.

**************
def handle_starttag(self, tag, attrs):
curElement=HTMLElement(tag.lower(),
attrs); <------ (1)

if(self.elementRoot == None):
self.elementRoot = curElement
else:

self.elementStack[-1].childs.append(curElement)
<------ (2)

self.elementStack.append(curElement)

**************

here is the definintion of htmlelement

class HTMLElement:
tag=None
attrs={}
data=''
childs=[]

the issue is the though new elements are pushed in the stack (1),
whenever i append the child to the stack top all other elements in the
stack is getting affected, i assume the same reference is used but is
there a way to overcome this ?
 
P

Peter Otten

Maxim said:
here is the definintion of htmlelement

class HTMLElement:
tag=None
attrs={}
data=''
childs=[]

the issue is the though new elements are pushed in the stack (1),
whenever i append the child to the stack top all other elements in the
stack is getting affected, i assume the same reference is used but is
there a way to overcome this ?

In

class A:
some_list = []

defines a class attribute shared by all instances of A. To turn some_list
into an instance attribute move the definition into the initializer:

class A:
def __init__(self):
self.some_list = []

Note that this holds for every attribute, but you usually see it only for
mutables like lists or dicts because in

class A:
x = "yadda"
y = []
a = A()
print a.x # yadda
a.x = 42
print a.x # 42
del a.x
print a.x # can you guess what happens?

the class attribute is shadowed by the instance attribute whereas

a.y.append(42)

modifies the class attribute in place.

# two more to check that you've understood the mechanism:
a.y += ["ham"] # ?
a.y = ["spam"] # ?

Peter
 
M

Maxim Mercury

Maxim said:
here is the definintion of htmlelement
class HTMLElement:
    tag=None
    attrs={}
    data=''
    childs=[]
the issue is the though new elements are pushed in the stack (1),
whenever i append the child to the stack top all other elements in the
stack is getting affected, i assume the same reference is used but is
there a way to overcome this ?

In

class A:
   some_list = []

defines a class attribute shared by all instances of A. To turn some_list
into an instance attribute move the definition into the initializer:

class A:
    def __init__(self):
        self.some_list = []

Note that this holds for every attribute, but you usually see it only for
mutables like lists or dicts because in

class A:
   x = "yadda"
   y = []
a = A()
print a.x # yadda
a.x = 42
print a.x # 42
del a.x
print a.x # can you guess what happens?

the class attribute is shadowed by the instance attribute whereas

a.y.append(42)

modifies the class attribute in place.

# two more to check that you've understood the mechanism:
a.y += ["ham"] # ?
a.y = ["spam"] # ?

Peter

Thanks a lot peter, that worked as i needed. Where can i find some
good documentation which explains such behavior.
 
A

Aaron Sterling

Thanks a lot peter, that worked as i needed. Where can i find some
good documentation which explains such behavior.

The reason for this behavior is the way python stores attributes.

Both a class and an instance of a class have a __dict__ attribute
which is a dictionary which stores attributes in name value pairs.

Consider the following class.

class A(object):
a = 1

def __init__(self, b):
self.b = b


Inspecting A.__dict__, one will see that it looks like {'a': 1} with
no reference to b.

Instantiating with a = A(2), one will see that a.__dict__ is {'b': 2}
with no reference to a.

If one accesses a.b, then Python will first look in a.__dict__ and
find 'b'. It will then return that value for a.b

If one instead accesses a.a then Python will first look in a.__dict__
and not find an entry for 'a'. It will then look in type(a).__dict__
== A.__dict__, find 'a' and return it for a.a

One can in fact use this behavior to shadow class attributes. If the
__init__ function is changed to

def __init__(self, a, b):
self.a = a
self.b = b

then all instances of A will have their own instance attribute named a
with whatever value is passed to __init__. They will still have a
class level attribute named a with value 1 but Python will never see
it because it will find an entry for a in some_instance.__dict__. If
one executes

del some_instance.a

Then on that one instance, visibility for the class level a will be
restored. In fact, one can always get the class level instance as
type(some_instance).__dict__['a'] but that's a little awkward.

The reason that this matters with mutable attributes and not with
(often) with immutable attributes is that a statement of the form

some_instance.some_mutable_attribute.append(foo)

will reference the same class level attribute regardless of the
instance it's called with. There's no assignment going on here. An
existing binding is being looked up, and the resulting value (a list
in this case) is having an attribute called on it. No new bindings are
being created. A statement of the form

some_instance.some_mutable_attribute = some_new_list

Will not affect the class level attribute at all but will simply
shadow it in the same manner as describe above. Once a name is bound
to an immutable value, the only way to change the value that it points
to is to rebind it. This means that any 'change' to a class level
immutable value (accessed through attribute lookup on an instance)
will simply shadow it on the instance upon which it is accessed.

HTH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,222
Members
46,810
Latest member
Kassie0918

Latest Threads

Top