B
Bogomir Engel
Hi all,
For a student project I have to be able to look up information in
xml-files that are several GB big. Depending on the input of the user
through the GUI data has to be displayed. And it's not applicable to
parse the whole file for every input. We can't use DOM since it would
load the whole file into memory. Our current approaches are based on the
use of SAX. We thought of generating some sort of index for every data
set that would provide us the byte offset in the file. The Project has
to be implemented in Java, so we wanted to do something like
Reader.skip(offsetBytes)
So we could jump to the location where our data set is without having to
parse the whole file. The Problem with that is, that we don't have any
idea on how to obtain the index information. How can you find out, where
in a file the SAX parser is (meaning the byte offset)?
Another point is that our tests with the SAX parser when skipping bytes
in it's input source produced this exception.
Content is not allowed in prolog
So we are wondering, whether it's possible to jump to some given
position and then parse from there.
I'm thankful for any advice since I'm quite helpless now. Many Thanks!
Bogomir Engel
For a student project I have to be able to look up information in
xml-files that are several GB big. Depending on the input of the user
through the GUI data has to be displayed. And it's not applicable to
parse the whole file for every input. We can't use DOM since it would
load the whole file into memory. Our current approaches are based on the
use of SAX. We thought of generating some sort of index for every data
set that would provide us the byte offset in the file. The Project has
to be implemented in Java, so we wanted to do something like
Reader.skip(offsetBytes)
So we could jump to the location where our data set is without having to
parse the whole file. The Problem with that is, that we don't have any
idea on how to obtain the index information. How can you find out, where
in a file the SAX parser is (meaning the byte offset)?
Another point is that our tests with the SAX parser when skipping bytes
in it's input source produced this exception.
Content is not allowed in prolog
So we are wondering, whether it's possible to jump to some given
position and then parse from there.
I'm thankful for any advice since I'm quite helpless now. Many Thanks!
Bogomir Engel