xml4c child nodes

M

marfi95

I'm trying to iterate through a list of child nodes. It seems like to
get the text value of the node, you have to do a
node->getFirstChild()->getNodeValue. This being said, there is a
hasChildNodes method, but if I use that, it includes the "text" nodes
also, which I don't want ot include.

if this is my xml:

<A>
<B></B>
<C></C>
</A>

if I have a node for B, I thought getNextSibling would return C, but it
didn't. it returned #text.

confused.
 
J

Joe Kesselman

<A>
<B></B>
<C></C>
</A>

if I have a node for B, I thought getNextSibling would return C, but it
didn't. it returned #text.

If you'd stopped to look at the value of that text node, you'd have
answered your own question -- it's the whitespace (newline and
indentation) between the B's end-tag and the start-tag for C.

XML doesn't know whether that whitespace text is meaningful or not, so
XML APIs will deliver it. Your app needs to deal with that appropriately.
 
M

Magnus Henriksson

Joe said:
XML doesn't know whether that whitespace text is meaningful or not, so
XML APIs will deliver it. Your app needs to deal with that appropriately.

Some XML APIs may report such whitespace as "ignorable". This is
whitespace between elements where the DTD does not allow PCDATA. This
assumes that there is a DTD.

But they are still nodes in the infoset.


// Magnus
 
M

Martin Honnen

I'm trying to iterate through a list of child nodes. It seems like to
get the text value of the node, you have to do a
node->getFirstChild()->getNodeValue. This being said, there is a
hasChildNodes method, but if I use that, it includes the "text" nodes
also, which I don't want ot include.

if this is my xml:

<A>
<B></B>
<C></C>
</A>

if I have a node for B, I thought getNextSibling would return C, but it
didn't. it returned #text.

Then check nodeType (respectively getNodeType()) till you find an
element node (node type is 1).
 
J

Joe Kesselman

Magnus said:
Some XML APIs may report such whitespace as "ignorable". This is
whitespace between elements where the DTD does not allow PCDATA. This
assumes that there is a DTD.

Good point. *If* there is a DTD or Schema available which provides that
information, some tools can be asked to suppress whitespace that appears
where only elements where expected. That's getting beyond straight
parsing into preliminary processing/filtering, since as Magnus says it
involves delivering a modified infoset.

Since that support is not always supported by the API -- or may be
supported in theory but not actually implemented on all parsers -- you
need to exercise a bit of care in relying on it. I've generally
preferred not to do so, for that reason and because sometimes users want
the whitespace preserved even when it isn't "meaningful" to the document.
 
M

marfi95

information, some tools can be asked to suppress whitespace that appears
where only elements where expected. That's getting beyond straight
parsing into preliminary processing/filtering, since as Magnus says it
involves delivering a modified infoset.

Since that support is not always supported by the API -- or may be
supported in theory but not actually implemented on all parsers -- you
need to exercise a bit of care in relying on it. I've generally
preferred not to do so, for that reason and because sometimes users want
the whitespace preserved even when it isn't "meaningful" to the document.

Thanks for the replies. But going back to my original XML example.

<A>
<B>Data</B>
<C>Data</C>
</A>

How can I determine if A has children ? calling hasChildNodes seems
worthless to me since it will always have the text node underneath it.
I guess I have to write my own version that doesn't look at the
TextNodes ?

TIA.
 
M

Martin Honnen

But going back to my original XML example.

<A>
<B>Data</B>
<C>Data</C>
</A>

How can I determine if A has children ? calling hasChildNodes seems
worthless to me since it will always have the text node underneath it.

Why, if you have e.g.
<A/>
or
<A />
or
<A></A>
then the element is really emtpy and hasChildNodes is false.
If you are looking for element child nodes only then you can use the
getElementsByTagName("*").length check (reports all descendant elements)
or use XPath if you API supports that (e.g. selectNodes("*").length,
reports all child elements).
 
M

marfi95

Martin said:
Why, if you have e.g.
<A/>
or
<A />
or
<A></A>
then the element is really emtpy and hasChildNodes is false.
If you are looking for element child nodes only then you can use the
getElementsByTagName("*").length check (reports all descendant elements)
or use XPath if you API supports that (e.g. selectNodes("*").length,
reports all child elements).
I was incorrect in my question. I was meaning to ask about B. I got a
DOM_Node for B and then check hasChildNodes and it returns True, when
there are no "real" child nodes. I didn't realize you could use a "*"
in the getElements, so I can use this instead of the hasChildNodes
call. Thanks for the help.
 
M

marfi95

I was incorrect in my question. I was meaning to ask about B. I got a
DOM_Node for B and then check hasChildNodes and it returns True, when
there are no "real" child nodes. I didn't realize you could use a "*"
in the getElements, so I can use this instead of the hasChildNodes
call. Thanks for the help.

sorry to bother again. But can someone please explain the difference
between a DOM_Node and a DOM_Element. Is a DOM_Element just a "type"
of DOM_Node ?

What I did was a getElementsByTagName for the DOM_Document to give me a
NodeList, then for each of those nodes, I was going to use a
getElementsByTag to get the child elemnentnodes("*").length to
determine if that node has any child elements, but can't because
getElementsByTagname is not part of DOM_Node, but DOM_Element. What is
the correct way of doing this please ? I'm new to DOM as you can see.
 
M

Martin Honnen

But can someone please explain the difference
between a DOM_Node and a DOM_Element. Is a DOM_Element just a "type"
of DOM_Node ?

Yes, node is usually an abstract base class (or interface) that is
extended by several concrete sub classes (or interfaces) (e.g. for
document, element, attribute, text, processing instruction, cdata
section, comment nodes).
What I did was a getElementsByTagName for the DOM_Document to give me a
NodeList, then for each of those nodes, I was going to use a
getElementsByTag to get the child elemnentnodes("*").length to
determine if that node has any child elements, but can't because
getElementsByTagname is not part of DOM_Node, but DOM_Element. What is
the correct way of doing this please ? I'm new to DOM as you can see.

You need to cast that DOM_Node that you have to a DOM_Element. With Java
you would simply do e.g.
Element el = (Element)node;
Can't help with exact xml4c syntax.
 
M

marfi95

Martin said:
Yes, node is usually an abstract base class (or interface) that is
extended by several concrete sub classes (or interfaces) (e.g. for
document, element, attribute, text, processing instruction, cdata
section, comment nodes).


You need to cast that DOM_Node that you have to a DOM_Element. With Java
you would simply do e.g.
Element el = (Element)node;
Can't help with exact xml4c syntax.

Thanks. I got it.
 
M

marfi95

Thanks. I got it.

Since getElementsByTag("*") returns all element nodes, is there an easy
way to only get the next level of elements.

i.e.
<A>
<B>
<C>
</C>
</B>
<B>
<C>
</C>
</B>
</A>

I would only want the NodeList returned to contain the B's element
nodes (and not C's)

Thanks.
 
J

Joseph Kesselman

Since getElementsByTag("*") returns all element nodes, is there an easy
way to only get the next level of elements.

Simplest: getFirstChild followed by repeated getNextSibling, ignoring
those which aren't elements.

Overkill: Use one of the mechanisms in the DOM Level 2 Traversal
feature, setting its filters to show you only the nodes you're
interested in.
 
M

marfi95

Joseph said:
Simplest: getFirstChild followed by repeated getNextSibling, ignoring
those which aren't elements.

Overkill: Use one of the mechanisms in the DOM Level 2 Traversal
feature, setting its filters to show you only the nodes you're
interested in.

Thanks. I've heard that using DOM can be very memory intensive because
of the tree and it might not be the best approach on "large" XML
documents. Does anyone have any numbers on what large would be and
where it might not be the appropriate method to use.

The XML we're talking about here could be around 30-40K, with about
1000 simultaneous users. Each thread would have their own parser
instance, which based on my understanding of what I've been reading,
that shouldn't be an issue. But I'm a little concerned over what I'm
reading about the memory usage.

Any ideas what kind of sizes we're talking about here ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,833
Latest member
BettyeMacf

Latest Threads

Top