J
JYA
Hi.
I was writing an xmltv parser using python when I faced some weirdness
that I couldn't explain.
What I'm doing, is read an xml file, create another dom object and copy
the element from one to the other.
At no time do I ever modify the original dom object, yet it gets modified.
Unless I missed something, it sounds like a bug to me.
the xml file is simply:
<?xml version="1.0" encoding="utf-8"?>
<tv><channel id="id1"><display-name lang="en">full
name</display-name></channel></tv>
which I store under the name test.xmltv
Here is the code, I've removed everything that isn't applicable to my
description. can't make it any simpler I'm afraid:
from xml.dom.minidom import Document
import xml.dom.minidom
def adjusttimezone(docxml, timezone):
doc = Document()
# Create the <tv> base element
tv_xml = doc.createElement("tv")
doc.appendChild(tv_xml)
#Create the channel list
channellist = docxml.getElementsByTagName('channel')
for x in channellist:
#Copy the original attributes
elem = doc.createElement("channel")
for y in x.attributes.keys():
name = x.attributes[y].name
value = x.attributes[y].value
elem.setAttribute(name,value)
for y in x.getElementsByTagName('display-name'):
elem.appendChild(y)
tv_xml.appendChild(elem)
return doc
if __name__ == '__main__':
handle = open('test.xmltv','r')
docxml = xml.dom.minidom.parse(handle)
print 'step1'
print docxml.toprettyxml(indent=" ",encoding="utf-8")
doc = adjusttimezone(docxml, 1000)
print 'step2'
print docxml.toprettyxml(indent=" ",encoding="utf-8")
Now at "step 1" I will display the content of the dom object, quite
natually it shows:
<?xml version="1.0" encoding="utf-8"?>
<tv>
<channel id="id1">
<display-name lang="en">
full name
</display-name>
</channel>
</tv>
After a call to adjusttimezone, "step 2" however will show:
<?xml version="1.0" encoding="utf-8"?>
<tv>
<channel id="id1"/>
</tv>
That's it !
You'll note that at no time do I modify the content of docxml, yet it
gets modified.
The weirdness disappear if I change the line
channellist = docxml.getElementsByTagName('channel')
to
channellist = copy.deepcopy(docxml.getElementsByTagName('channel'))
However, my understanding is that it shouldn't be necessary.
Any thoughts on this weirdness ?
Thanks
Jean-Yves
I was writing an xmltv parser using python when I faced some weirdness
that I couldn't explain.
What I'm doing, is read an xml file, create another dom object and copy
the element from one to the other.
At no time do I ever modify the original dom object, yet it gets modified.
Unless I missed something, it sounds like a bug to me.
the xml file is simply:
<?xml version="1.0" encoding="utf-8"?>
<tv><channel id="id1"><display-name lang="en">full
name</display-name></channel></tv>
which I store under the name test.xmltv
Here is the code, I've removed everything that isn't applicable to my
description. can't make it any simpler I'm afraid:
from xml.dom.minidom import Document
import xml.dom.minidom
def adjusttimezone(docxml, timezone):
doc = Document()
# Create the <tv> base element
tv_xml = doc.createElement("tv")
doc.appendChild(tv_xml)
#Create the channel list
channellist = docxml.getElementsByTagName('channel')
for x in channellist:
#Copy the original attributes
elem = doc.createElement("channel")
for y in x.attributes.keys():
name = x.attributes[y].name
value = x.attributes[y].value
elem.setAttribute(name,value)
for y in x.getElementsByTagName('display-name'):
elem.appendChild(y)
tv_xml.appendChild(elem)
return doc
if __name__ == '__main__':
handle = open('test.xmltv','r')
docxml = xml.dom.minidom.parse(handle)
print 'step1'
print docxml.toprettyxml(indent=" ",encoding="utf-8")
doc = adjusttimezone(docxml, 1000)
print 'step2'
print docxml.toprettyxml(indent=" ",encoding="utf-8")
Now at "step 1" I will display the content of the dom object, quite
natually it shows:
<?xml version="1.0" encoding="utf-8"?>
<tv>
<channel id="id1">
<display-name lang="en">
full name
</display-name>
</channel>
</tv>
After a call to adjusttimezone, "step 2" however will show:
<?xml version="1.0" encoding="utf-8"?>
<tv>
<channel id="id1"/>
</tv>
That's it !
You'll note that at no time do I modify the content of docxml, yet it
gets modified.
The weirdness disappear if I change the line
channellist = docxml.getElementsByTagName('channel')
to
channellist = copy.deepcopy(docxml.getElementsByTagName('channel'))
However, my understanding is that it shouldn't be necessary.
Any thoughts on this weirdness ?
Thanks
Jean-Yves