M
Michael Powe
i would like to use NIO to improve performance processing some very
large text files -- 1 to 4 GB. i have written my processor in
standard i/o and it's impractically slow.
the processing is line-oriented and not complicated, which is why i
believe that i/o is the performance bottleneck.
i just cannot 'get' the design process for using NIO, based on online
examples and reading the API. i could use some help.
i've figured out how to memory map a buffer from the input file, turn
it into a CharBuffer, read lines from it and do a simple test parse of
the lines (counting the occurrence of a regex match). i got most of
that from an example on the sun site.
but, how do i write out the results? i have managed to create an
output buffer from a RandomAccessFile object, but i just get a file of
empty bytes, the size of the buffer. somehow, i need to write chars
to that file, and i need to end up with a plain text file, the size of
which matches the amount of data written to it, not the size of the
memory buffer.
a further question is, how do i 'slide the window' along a multi-GB
file, reading in small pieces at a time for processing? and, finally,
how do i determine an optimal buffer size? is bigger better? on my
laptop, i have to fiddle the VM heap size and read buffer size to keep
from exhausting the heap. i'm not sure what is the best approach.
many thanks for help. i have spent a lot of hours this weekend just
to make this little progress.
thanks.
mp
large text files -- 1 to 4 GB. i have written my processor in
standard i/o and it's impractically slow.
the processing is line-oriented and not complicated, which is why i
believe that i/o is the performance bottleneck.
i just cannot 'get' the design process for using NIO, based on online
examples and reading the API. i could use some help.
i've figured out how to memory map a buffer from the input file, turn
it into a CharBuffer, read lines from it and do a simple test parse of
the lines (counting the occurrence of a regex match). i got most of
that from an example on the sun site.
but, how do i write out the results? i have managed to create an
output buffer from a RandomAccessFile object, but i just get a file of
empty bytes, the size of the buffer. somehow, i need to write chars
to that file, and i need to end up with a plain text file, the size of
which matches the amount of data written to it, not the size of the
memory buffer.
a further question is, how do i 'slide the window' along a multi-GB
file, reading in small pieces at a time for processing? and, finally,
how do i determine an optimal buffer size? is bigger better? on my
laptop, i have to fiddle the VM heap size and read buffer size to keep
from exhausting the heap. i'm not sure what is the best approach.
many thanks for help. i have spent a lot of hours this weekend just
to make this little progress.
thanks.
mp