R
Richard Lewis
Hi there,
I have an XML document which contains a mixture of structural nodes
(called 'section' and with unique 'id' attributes) and non-structural
nodes (called anything else). The structural elements ('section's) can
contain, as well as non-structural elements, other structural elements.
I'm doing the Python DOM programming with this document and have got
stuck with something.
I want to be able to get all the non-structural elements which are
children of a given 'section' elemenent (identified by 'id' attribute)
but not children of any child 'section' elements of the given 'section'.
e.g.:
<section id="a">
<foo>bar</foo>
</section>
<section id="b">
<foo>baz</foo>
<section id="c">
<bar>foo</bar>
</section>
</section>
Given this document, the working function would return "<foo>baz</foo>"
for id='b' and "<bar>foo</bar>" for id='c'.
Normally, recursion is used for DOM traversals. I've tried this function
which uses recursion with a generator (can the two be mixed?)
def content_elements(node):
if node.hasChildNodes():
node = node.firstChild
if not page_node(node):
yield node
for e in self.content_elements(node):
yield e
node = node.nextSibling
which didn't work. So I tried it without using a generator:
def content_elements(node, elements):
if node.hasChildNodes():
node = node.firstChild
if node.nodeType == Node.ELEMENT_NODE: print node.tagName
if not page_node(node):
elements.append(node)
self.content_elements(node, elements)
node = node.nextSibling
return elements
However, I got exactly the same problem: each time I use this function I
just get a DOM Text node with a few white space (tabs and returns) in
it. I guess this is the indentation in my source document? But why do I
not get the propert element nodes?
Cheers,
Richard
I have an XML document which contains a mixture of structural nodes
(called 'section' and with unique 'id' attributes) and non-structural
nodes (called anything else). The structural elements ('section's) can
contain, as well as non-structural elements, other structural elements.
I'm doing the Python DOM programming with this document and have got
stuck with something.
I want to be able to get all the non-structural elements which are
children of a given 'section' elemenent (identified by 'id' attribute)
but not children of any child 'section' elements of the given 'section'.
e.g.:
<section id="a">
<foo>bar</foo>
</section>
<section id="b">
<foo>baz</foo>
<section id="c">
<bar>foo</bar>
</section>
</section>
Given this document, the working function would return "<foo>baz</foo>"
for id='b' and "<bar>foo</bar>" for id='c'.
Normally, recursion is used for DOM traversals. I've tried this function
which uses recursion with a generator (can the two be mixed?)
def content_elements(node):
if node.hasChildNodes():
node = node.firstChild
if not page_node(node):
yield node
for e in self.content_elements(node):
yield e
node = node.nextSibling
which didn't work. So I tried it without using a generator:
def content_elements(node, elements):
if node.hasChildNodes():
node = node.firstChild
if node.nodeType == Node.ELEMENT_NODE: print node.tagName
if not page_node(node):
elements.append(node)
self.content_elements(node, elements)
node = node.nextSibling
return elements
However, I got exactly the same problem: each time I use this function I
just get a DOM Text node with a few white space (tabs and returns) in
it. I guess this is the indentation in my source document? But why do I
not get the propert element nodes?
Cheers,
Richard