M
Mark Chao
Hi, I am a newbie, I spend quite sometime searching on the web, but I
didn't find anything. I hope this question is not too bad to ask here.
I am trying to convert XML document into another form, such as this:
<a>
A
<b>B</b>
<c>C</c>
</a>
should be converted to this:
a A
a b B
a c C
I am using the Java's sax parser with my own extended DefaultHandler.
Usually XML documents given to me will have the elements and child
elements properly idented (as above). However this will cause problem,
as the character() in the handler class will be called even between 2
endElement() call, sometimes between 2 startElement() call.
This will also cause problem as the "A" will be parsed to "\n\tA"
because it is just parsed as it is. The obvious way to solve this
problem is to just make my handler taking only XML files which have no
"\n" nor "\t" escape characters. I can also manually take out any of
these escape characters, but it will also accidentally remove any
intended escape characters.
Another way would be disallowing XML documents which have character
data between 2 startElement or 2 endElement. ie only have character
data between 1 startElement and 1 endElement. However this constraint
is too heavy and not appropriate.
This is just a semantic problem, but I just want to know if there are
any other ways to tackle the problem.
didn't find anything. I hope this question is not too bad to ask here.
I am trying to convert XML document into another form, such as this:
<a>
A
<b>B</b>
<c>C</c>
</a>
should be converted to this:
a A
a b B
a c C
I am using the Java's sax parser with my own extended DefaultHandler.
Usually XML documents given to me will have the elements and child
elements properly idented (as above). However this will cause problem,
as the character() in the handler class will be called even between 2
endElement() call, sometimes between 2 startElement() call.
This will also cause problem as the "A" will be parsed to "\n\tA"
because it is just parsed as it is. The obvious way to solve this
problem is to just make my handler taking only XML files which have no
"\n" nor "\t" escape characters. I can also manually take out any of
these escape characters, but it will also accidentally remove any
intended escape characters.
Another way would be disallowing XML documents which have character
data between 2 startElement or 2 endElement. ie only have character
data between 1 startElement and 1 endElement. However this constraint
is too heavy and not appropriate.
This is just a semantic problem, but I just want to know if there are
any other ways to tackle the problem.