Retreiving element character offset in a source document

P

primeau

Hi,
using either a sax or dom parser, ans preferably in java, is there
a way of getting where an lement starts and finishes in a given
document?

<tag>abc<aa/></tag>

where <tag> starts at 0 and has a length of 4
abc starts at 5 and has a length of 4
and so on...

Thanks
JMP
 
J

Joe Kesselman

primeau said:
Hi,
using either a sax or dom parser, ans preferably in java, is there
a way of getting where an element starts and finishes in a given
document?

Uhm... Maybe.

Most SAX parsers can optionally give you locator information -- but that
tends to be expressed as line/column, rather than simple offset from the
start of the file.

Note that reporting offset starts getting complicated when you start
dealing with multiple encodings, or encodings where a character may take
a varying number of bytes -- do you want character count or byte count?
 
P

primeau

Joe said:
Uhm... Maybe.

Most SAX parsers can optionally give you locator information -- but that
tends to be expressed as line/column, rather than simple offset from the
start of the file.

Note that reporting offset starts getting complicated when you start
dealing with multiple encodings, or encodings where a character may take
a varying number of bytes -- do you want character count or byte count?

I could make do with either I guess. I know what the encoding is
before I pass it to the parser, so it wouldn't be a problem one way or
another.

Cheers
JMP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,005
Messages
2,570,264
Members
46,859
Latest member
HeidiAtkin

Latest Threads

Top