xml.dom.minidom weirdness: bug?

J

JYA

Hi.

I was writing an xmltv parser using python when I faced some weirdness
that I couldn't explain.

What I'm doing, is read an xml file, create another dom object and copy
the element from one to the other.

At no time do I ever modify the original dom object, yet it gets modified.

Unless I missed something, it sounds like a bug to me.

the xml file is simply:
<?xml version="1.0" encoding="utf-8"?>
<tv><channel id="id1"><display-name lang="en">full
name</display-name></channel></tv>

which I store under the name test.xmltv

Here is the code, I've removed everything that isn't applicable to my
description. can't make it any simpler I'm afraid:

from xml.dom.minidom import Document
import xml.dom.minidom


def adjusttimezone(docxml, timezone):
doc = Document()

# Create the <tv> base element
tv_xml = doc.createElement("tv")
doc.appendChild(tv_xml)

#Create the channel list
channellist = docxml.getElementsByTagName('channel')

for x in channellist:
#Copy the original attributes
elem = doc.createElement("channel")
for y in x.attributes.keys():
name = x.attributes[y].name
value = x.attributes[y].value
elem.setAttribute(name,value)
for y in x.getElementsByTagName('display-name'):
elem.appendChild(y)
tv_xml.appendChild(elem)

return doc

if __name__ == '__main__':
handle = open('test.xmltv','r')
docxml = xml.dom.minidom.parse(handle)
print 'step1'
print docxml.toprettyxml(indent=" ",encoding="utf-8")
doc = adjusttimezone(docxml, 1000)
print 'step2'
print docxml.toprettyxml(indent=" ",encoding="utf-8")

Now at "step 1" I will display the content of the dom object, quite
natually it shows:
<?xml version="1.0" encoding="utf-8"?>
<tv>
<channel id="id1">
<display-name lang="en">
full name
</display-name>
</channel>
</tv>

After a call to adjusttimezone, "step 2" however will show:
<?xml version="1.0" encoding="utf-8"?>
<tv>
<channel id="id1"/>
</tv>

That's it !

You'll note that at no time do I modify the content of docxml, yet it
gets modified.

The weirdness disappear if I change the line
channellist = docxml.getElementsByTagName('channel')
to
channellist = copy.deepcopy(docxml.getElementsByTagName('channel'))

However, my understanding is that it shouldn't be necessary.

Any thoughts on this weirdness ?

Thanks
Jean-Yves
 
G

Gabriel Genellina

What I'm doing, is read an xml file, create another dom object and copy
the element from one to the other.

At no time do I ever modify the original dom object, yet it gets
modified.

for y in x.getElementsByTagName('display-name'):
elem.appendChild(y)
tv_xml.appendChild(elem)
You'll note that at no time do I modify the content of docxml, yet it
gets modified.

The weirdness disappear if I change the line
channellist = docxml.getElementsByTagName('channel')
to
channellist = copy.deepcopy(docxml.getElementsByTagName('channel'))

However, my understanding is that it shouldn't be necessary.

I think that any element can have only a single parent. If you get an
element from one document and insert it onto another document, it gets
removed from the first.
 
M

Marc Christiansen

JYA said:
for y in x.getElementsByTagName('display-name'):
elem.appendChild(y)

Like Gabriel wrote, nodes can only have one parent. Use
elem.appendChild(y.cloneNode(True))
instead. Or y.cloneNode(False), if you want a shallow copy (i.e. without
any of the children, e.g. text content).

Marc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top