A
Alan Meyer
I'm having some trouble inserting elements where I want them
using the lxml ElementTree (Python 2.6). I presume I'm making
some wrong assumptions about how lxml works and I'm hoping
someone can clue me in.
I want to process an xml document as follows:
For every occurrence of a particular element, no matter where it
appears in the tree, I want to add a sibling to that element with
the same name and a different value.
Here's the smallest artificial example I've found so far
demonstrates the problem:
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
What I'd like to produce is this:
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
Here's my program:
-------- cut here -----
from lxml import etree as etree
xml = """<?xml version="1.0" ?>
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
"""
tree = etree.fromstring(xml)
# A list of all "bingo" element objects in the unmodified original xml
# There's only one in this example
elems = tree.xpath("//bingo")
# For each one, insert a sibling after it
bingoCounter = 0
for elem in elems:
parent = elem.getparent()
subIter = parent.iter()
pos = 0
for subElem in subIter:
# Is it one we want to create a sibling for?
if subElem == elem:
newElem = etree.Element("bingo")
bingoCounter += 1
newElem.text = "New bingo %d" % bingoCounter
newElem.tail = "\n"
parent.insert(pos, newElem)
break
pos += 1
newXml = etree.tostring(tree)
print("")
print(newXml)
-------- cut here -----
The output follows:
-------- output -----
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
<bingo>New bingo 1</bingo>
</foo>
-------- output -----
Setting aside the whitespace issues, the bug in the program shows
up in the positioning of the insertion. I wanted and expected it
to appear immediately after the original "bingo" element,
and before the "bar" element, but it appeared after the "bar"
instead of before it.
Everything works if I take the "something" element out of the
original input document. The new "bingo" appears before the
"bar". But when I put it back in, the inserted bingo is out of
order. Why should that be? What am I misunderstanding?
Is there a more intelligent way to do what I'm trying to do?
Thanks.
Alan
using the lxml ElementTree (Python 2.6). I presume I'm making
some wrong assumptions about how lxml works and I'm hoping
someone can clue me in.
I want to process an xml document as follows:
For every occurrence of a particular element, no matter where it
appears in the tree, I want to add a sibling to that element with
the same name and a different value.
Here's the smallest artificial example I've found so far
demonstrates the problem:
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
What I'd like to produce is this:
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
Here's my program:
-------- cut here -----
from lxml import etree as etree
xml = """<?xml version="1.0" ?>
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
"""
tree = etree.fromstring(xml)
# A list of all "bingo" element objects in the unmodified original xml
# There's only one in this example
elems = tree.xpath("//bingo")
# For each one, insert a sibling after it
bingoCounter = 0
for elem in elems:
parent = elem.getparent()
subIter = parent.iter()
pos = 0
for subElem in subIter:
# Is it one we want to create a sibling for?
if subElem == elem:
newElem = etree.Element("bingo")
bingoCounter += 1
newElem.text = "New bingo %d" % bingoCounter
newElem.tail = "\n"
parent.insert(pos, newElem)
break
pos += 1
newXml = etree.tostring(tree)
print("")
print(newXml)
-------- cut here -----
The output follows:
-------- output -----
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
<bingo>New bingo 1</bingo>
</foo>
-------- output -----
Setting aside the whitespace issues, the bug in the program shows
up in the positioning of the insertion. I wanted and expected it
to appear immediately after the original "bingo" element,
and before the "bar" element, but it appeared after the "bar"
instead of before it.
Everything works if I take the "something" element out of the
original input document. The new "bingo" appears before the
"bar". But when I put it back in, the inserted bingo is out of
order. Why should that be? What am I misunderstanding?
Is there a more intelligent way to do what I'm trying to do?
Thanks.
Alan