Problem inserting an element where I want it using lxml

A

Alan Meyer

I'm having some trouble inserting elements where I want them
using the lxml ElementTree (Python 2.6). I presume I'm making
some wrong assumptions about how lxml works and I'm hoping
someone can clue me in.

I want to process an xml document as follows:

For every occurrence of a particular element, no matter where it
appears in the tree, I want to add a sibling to that element with
the same name and a different value.

Here's the smallest artificial example I've found so far
demonstrates the problem:

<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>

What I'd like to produce is this:

<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>

Here's my program:

-------- cut here -----
from lxml import etree as etree

xml = """<?xml version="1.0" ?>
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
"""

tree = etree.fromstring(xml)

# A list of all "bingo" element objects in the unmodified original xml
# There's only one in this example
elems = tree.xpath("//bingo")

# For each one, insert a sibling after it
bingoCounter = 0
for elem in elems:
parent = elem.getparent()
subIter = parent.iter()
pos = 0
for subElem in subIter:
# Is it one we want to create a sibling for?
if subElem == elem:
newElem = etree.Element("bingo")
bingoCounter += 1
newElem.text = "New bingo %d" % bingoCounter
newElem.tail = "\n"
parent.insert(pos, newElem)
break
pos += 1

newXml = etree.tostring(tree)
print("")
print(newXml)
-------- cut here -----

The output follows:

-------- output -----
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
<bingo>New bingo 1</bingo>
</foo>
-------- output -----

Setting aside the whitespace issues, the bug in the program shows
up in the positioning of the insertion. I wanted and expected it
to appear immediately after the original "bingo" element,
and before the "bar" element, but it appeared after the "bar"
instead of before it.

Everything works if I take the "something" element out of the
original input document. The new "bingo" appears before the
"bar". But when I put it back in, the inserted bingo is out of
order. Why should that be? What am I misunderstanding?

Is there a more intelligent way to do what I'm trying to do?

Thanks.

Alan
 
S

Stefan Behnel

Alan Meyer, 05.01.2011 06:57:
I'm having some trouble inserting elements where I want them
using the lxml ElementTree (Python 2.6). I presume I'm making
some wrong assumptions about how lxml works and I'm hoping
someone can clue me in.

I want to process an xml document as follows:

For every occurrence of a particular element, no matter where it
appears in the tree, I want to add a sibling to that element with
the same name and a different value.

Here's the smallest artificial example I've found so far
demonstrates the problem:

<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>

What I'd like to produce is this:

<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>

Looks trivial to me. ;)

Here's my program:

-------- cut here -----
from lxml import etree as etree

xml = """<?xml version="1.0" ?>
<foo>
<whatever>
<something/>
</whatever>
<bingo>Add another bingo after this</bingo>
<bar/>
</foo>
"""

tree = etree.fromstring(xml)

# A list of all "bingo" element objects in the unmodified original xml
# There's only one in this example
elems = tree.xpath("//bingo")

# For each one, insert a sibling after it
bingoCounter = 0
for elem in elems:
parent = elem.getparent()
subIter = parent.iter()

".iter()" gives you a recursive iterator that will also yield the
"something" Element in your case, thus the incorrect counting. You only
want the children, so you should iterate over the Element itself.

pos = 0
for subElem in subIter:
# Is it one we want to create a sibling for?
if subElem == elem:

There is an .index() method on Elements that does what you want to achieve
here. However, the right way to do it is to use ".addnext()".

http://codespeak.net/lxml/api/lxml.etree._Element-class.html

Stefan
 
A

Alan Meyer

...
Looks trivial to me. ;)
...
".iter()" gives you a recursive iterator that will also yield the
"something" Element in your case, thus the incorrect counting. You only
want the children, so you should iterate over the Element itself.

Thanks Stephan.

I went home and went to sleep and woke up in the middle of the night and
thought, wait a minute, iter() is giving me a depth first list of
elements but insert() is indexing children of the parent.

I think I must have been up too late.
There is an .index() method on Elements that does what you want to
achieve here. However, the right way to do it is to use ".addnext()".

http://codespeak.net/lxml/api/lxml.etree._Element-class.html

Stefan
Those are exactly the functions I wanted. I didn't see them (and still
don't) in the Python ElementTree documentation and thought I had to use
parent.insert().

Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,979
Messages
2,570,185
Members
46,723
Latest member
TwilaTarde

Latest Threads

Top