SAX multiple calls to characters()

M

metzger

I am using the function listed below to handle characters events in
SAX. It does not handle multiple sequential calls to this function
correctly. For example, I am getting
"2 4 816 32 64" as a value for an element when processing <vec> 2 4 8
16 32 64 </vec>
because I am getting 2 calls to process the text in this element, one
for "2 4 8" and the other for "16 32 64". I have tried appending a
blank to the result after each call to this function, but that
sometimes splits numbers or words, depending on what is passed to this
function.

Is there a better way of handling multiple characters events? Thanks

public void characters(char[] chars, int start, int length) {
while ( (length > 0) && Character.isWhitespace(chars[start]) )
{
++start;
--length;
}
while ( (length > 0) &&
Character.isWhitespace(chars[start+length-1]) ) {
--length;
}
if ( length > 0 ) {
_text += new String(chars,start,length);
}
}
 
B

Bjoern Hoehrmann

* (e-mail address removed) wrote in comp.text.xml:
I am using the function listed below to handle characters events in
SAX. It does not handle multiple sequential calls to this function
correctly.

Then you need to change that. It is normal for SAX processors to call
the characters() callback multiple times, you have to design your code
so it can handle that. One option here is to simply buffer the data and
process it when all data has been accumulated (e.g., when the endElement
callback is called).
 
S

Sylvain Loiseau

Le 02-08-2006 said:
I am using the function listed below to handle characters events in
SAX. It does not handle multiple sequential calls to this function
correctly.
For example, I am getting
"2 4 816 32 64" as a value for an element when processing <vec> 2 4 8
16 32 64 </vec>
because I am getting 2 calls to process the text in this element, one
for "2 4 8" and the other for "16 32 64".

The expected behavior of a SAX API preserve all characters, whitespaces
included, of the input document. Your problem is either in your code, either
(less probably ;-)) in the SAX implementation you used.
I have tried appending a
blank to the result after each call to this function, but that
sometimes splits numbers or words, depending on what is passed to this
function.

I think this is definitly not the good solution :)
Is there a better way of handling multiple characters events? Thanks
public void characters(char[] chars, int start, int length) {
while ( (length > 0) && Character.isWhitespace(chars[start]) )
{
++start;
--length;
}

while ( (length > 0) &&
Character.isWhitespace(chars[start+length-1]) ) {
--length;
}

This is this piece of code, as far as I understand, which is responsible for
the behaviour you complain about! You remove the trailling whitespace
characters in the characters chunks you receive, so how can you expect to see
the whitespace characters in the outputed string?
if ( length > 0 ) {
_text += new String(chars,start,length);
}
}

Concatenation is dangerous for performance. You may consider using a
StringBuffer (sb.append(chars, start, length)).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,261
Members
46,859
Latest member
VallieMcKe

Latest Threads

Top