constructin trees in python

Maxim Mercury · Nov 20, 2010

Hi ,
Iam very much new to python. Iam trying to construct a xml dom tree
using the builtin HTMLParser class (on data event callbacks). Iam
maintaining the childs as a list of elements and whenver the sax
parser encounters a tag i push it to a local stack, my basic logic is
below.

**************
def handle_starttag(self, tag, attrs):
curElement=HTMLElement(tag.lower(),
attrs); <------ (1)

if(self.elementRoot == None):
self.elementRoot = curElement
else:

self.elementStack[-1].childs.append(curElement)
<------ (2)

self.elementStack.append(curElement)

**************

here is the definintion of htmlelement

class HTMLElement:
tag=None
attrs={}
data=''
childs=[]

the issue is the though new elements are pushed in the stack (1),
whenever i append the child to the stack top all other elements in the
stack is getting affected, i assume the same reference is used but is
there a way to overcome this ?

Peter Otten · Nov 20, 2010

Maxim said:
here is the definintion of htmlelement

class HTMLElement:
tag=None
attrs={}
data=''
childs=[]

the issue is the though new elements are pushed in the stack (1),
whenever i append the child to the stack top all other elements in the
stack is getting affected, i assume the same reference is used but is
there a way to overcome this ?

In

class A:
some_list = []

defines a class attribute shared by all instances of A. To turn some_list
into an instance attribute move the definition into the initializer:

class A:
def __init__(self):
self.some_list = []

Note that this holds for every attribute, but you usually see it only for
mutables like lists or dicts because in

class A:
x = "yadda"
y = []
a = A()
print a.x # yadda
a.x = 42
print a.x # 42
del a.x
print a.x # can you guess what happens?

the class attribute is shadowed by the instance attribute whereas

a.y.append(42)

modifies the class attribute in place.

# two more to check that you've understood the mechanism:
a.y += ["ham"] # ?
a.y = ["spam"] # ?

Peter

Maxim Mercury · Nov 20, 2010

Maxim said:
Maxim said:

here is the definintion of htmlelement

Click to expand...

class HTMLElement:
tag=None
attrs={}
data=''
childs=[]

Click to expand...

the issue is the though new elements are pushed in the stack (1),
whenever i append the child to the stack top all other elements in the
stack is getting affected, i assume the same reference is used but is
there a way to overcome this ?

Click to expand...

In

class A:
some_list = []

defines a class attribute shared by all instances of A. To turn some_list
into an instance attribute move the definition into the initializer:

class A:
def __init__(self):
self.some_list = []

Note that this holds for every attribute, but you usually see it only for
mutables like lists or dicts because in

class A:
x = "yadda"
y = []
a = A()
print a.x # yadda
a.x = 42
print a.x # 42
del a.x
print a.x # can you guess what happens?

the class attribute is shadowed by the instance attribute whereas

a.y.append(42)

modifies the class attribute in place.

# two more to check that you've understood the mechanism:
a.y += ["ham"] # ?
a.y = ["spam"] # ?

Peter

Thanks a lot peter, that worked as i needed. Where can i find some
good documentation which explains such behavior.

Aaron Sterling · Nov 21, 2010

Thanks a lot peter, that worked as i needed. Where can i find some

good documentation which explains such behavior.

The reason for this behavior is the way python stores attributes.

Both a class and an instance of a class have a __dict__ attribute
which is a dictionary which stores attributes in name value pairs.

Consider the following class.

class A(object):
a = 1

def __init__(self, b):
self.b = b

Inspecting A.__dict__, one will see that it looks like {'a': 1} with
no reference to b.

Instantiating with a = A(2), one will see that a.__dict__ is {'b': 2}
with no reference to a.

If one accesses a.b, then Python will first look in a.__dict__ and
find 'b'. It will then return that value for a.b

If one instead accesses a.a then Python will first look in a.__dict__
and not find an entry for 'a'. It will then look in type(a).__dict__
== A.__dict__, find 'a' and return it for a.a

One can in fact use this behavior to shadow class attributes. If the
__init__ function is changed to

def __init__(self, a, b):
self.a = a
self.b = b

then all instances of A will have their own instance attribute named a
with whatever value is passed to __init__. They will still have a
class level attribute named a with value 1 but Python will never see
it because it will find an entry for a in some_instance.__dict__. If
one executes

del some_instance.a

Then on that one instance, visibility for the class level a will be
restored. In fact, one can always get the class level instance as
type(some_instance).__dict__['a'] but that's a little awkward.

The reason that this matters with mutable attributes and not with
(often) with immutable attributes is that a statement of the form

some_instance.some_mutable_attribute.append(foo)

will reference the same class level attribute regardless of the
instance it's called with. There's no assignment going on here. An
existing binding is being looked up, and the resulting value (a list
in this case) is having an attribute called on it. No new bindings are
being created. A statement of the form

some_instance.some_mutable_attribute = some_new_list

Will not affect the class level attribute at all but will simply
shadow it in the same manner as describe above. Once a name is bound
to an immutable value, the only way to change the value that it points
to is to rebind it. This means that any 'change' to a class level
immutable value (accessed through attribute lookup on an instance)
will simply shadow it on the instance upon which it is accessed.

HTH

HTMLParser not parsing whole html file	4	Oct 24, 2010
Implementing Many Stacks in the Same Program	1	Aug 10, 2021
HTMLParser skipping HTML? [newbie]	6	Sep 5, 2012
recursive outline numbering for object trees	4	Mar 30, 2009
Ideas on how to parse a dynamically generated html pages	1	Oct 22, 2010
Trees in Python?	1	Jul 1, 2004
comparing binary trees in C	12	May 1, 2009
How to I do this in Python ?	6	Aug 16, 2013

constructin trees in python

Maxim Mercury

Peter Otten

Maxim Mercury

Aaron Sterling

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads