Is it possible with xerces ?

M

Manuel Yguel

I try to parse an indented xml file with dom xerces c++.
the file is like that :
<root>
<child1>
<field1> foo </field1>
<field2> bar </field2>
</child1>
<child2>
<field1> foo </field1>
<field2> bar </field2>
</child2>
</root>

where return an white spaces are in the xml file. So the program I
writed with dom give me this tree :
root has five childs :
text-node child1 text-node child2 text-node

the text of the first text-node is "\n "
the text of the second text-node is "\n "
the text of the third text-node is "\n"

these text-node of spaces occurs at each step in the tree hierarchy.

Is it possible to strip these nodes automatically ?

XML standard question : does this xml code respects the xml standard ?

<child2> some text
<field1> foo </field1>
<field2> bar </field2>
</child2>

"some text" is in the same depth of field1 and field2 but is a text. So
there is a soap of text and element. I thougth that the text must be a
leaf of the tree ... So does it respects the standard ?

Thanks
 
P

Philippe Poulard

Manuel said:
I try to parse an indented xml file with dom xerces c++.
the file is like that :
<root>
<child1>
<field1> foo </field1>
<field2> bar </field2>
</child1>
<child2>
<field1> foo </field1>
<field2> bar </field2>
</child2>
</root>

where return an white spaces are in the xml file. So the program I
writed with dom give me this tree :
root has five childs :
text-node child1 text-node child2 text-node

the text of the first text-node is "\n "
the text of the second text-node is "\n "
the text of the third text-node is "\n"

these text-node of spaces occurs at each step in the tree hierarchy.

Is it possible to strip these nodes automatically ?

yes : there is an option that allows to strip ignorable whitespaces, but
you must give a grammar that defines where are ignorable whitespaces,
like this :

XML standard question : does this xml code respects the xml standard ?

<child2> some text
<field1> foo </field1>
<field2> bar </field2>
</child2>

"some text" is in the same depth of field1 and field2 but is a text. So
there is a soap of text and element. I thougth that the text must be a
leaf of the tree ... So does it respects the standard ?

yes : an element may contain :
-nothing (empty element)
-subelements
-text
-text and subelements


--
Cordialement,

///
(. .)
-----ooO--(_)--Ooo-----
| Philippe Poulard |
-----------------------
 
M

Manuel Yguel

Philippe said:
yes : there is an option that allows to strip ignorable whitespaces, but
you must give a grammar that defines where are ignorable whitespaces,
like this :

<!ELEMENT root (child1,child2)>
thanks, but after how do you use the grammar with the parser ?
 
P

Philippe Poulard

Manuel said:
thanks, but after how do you use the grammar with the parser ?

use the <!DOCTYPE> declaration
you should have a look at the spec
--
Cordialement,

///
(. .)
-----ooO--(_)--Ooo-----
| Philippe Poulard |
-----------------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top