pubsetbuf, filebuf and when to flush

N

nitroamos

i'm working on improving the IO for the software project i'm working on
to do two extra things. first, i'm going to add HDF5 functionality, and
second, add the ability to write binary output. the software is
computational science where a lot of stuff is going to be calculated
and written to a file. e.g. could be up to gigabytes of data.

my question has to do with the latter -- and how often i need to call
"flush" on the stream. what I want the software to do is store up
several entries in memory and then dump several megabytes (or some
optimal number) all at once since this is the optimal setting that
makes the most sense to me.

so how does the buffering in a fstream, (or a filebuf) work?
specifically, if i just keep doing "write" commands, will it dump the
entire buffer when the buffer gets full? or will it only dump enough of
the buffer to make room for each "write"? can I rely on the standard
c++ libraries written for each machine to choose an optimal buffering
scheme and buffer size? it seems like at least some of these details
are implementation dependent... but i don't know which.

here's a reference:
http://www.cplusplus.com/ref/iostream/filebuf/
but nobody really discusses this issue that I can find any reference
to.

basically i'm wondering if i need to add the following to my code:
1) choose an optimal buffer size somehow
2) create and give it to the filebuf via pubsetbuf
3) figure out how many entries would fill my buffer

then in the process of the calculation
when I know the buffer is full, flush

so my question is, how much can i rely on the standard IO library to
handle this for me? i assume that each company who writes an IO library
for their machine would know the best way to handle it, but i just want
to be sure that none of their choices affect me.

thanks!

Amos.
nitroamos a t y a h o o
 
J

John Harrison

i'm working on improving the IO for the software project i'm working on
to do two extra things. first, i'm going to add HDF5 functionality, and
second, add the ability to write binary output. the software is
computational science where a lot of stuff is going to be calculated
and written to a file. e.g. could be up to gigabytes of data.

my question has to do with the latter -- and how often i need to call
"flush" on the stream. what I want the software to do is store up
several entries in memory and then dump several megabytes (or some
optimal number) all at once since this is the optimal setting that
makes the most sense to me.

so how does the buffering in a fstream, (or a filebuf) work?
specifically, if i just keep doing "write" commands, will it dump the
entire buffer when the buffer gets full? or will it only dump enough of
the buffer to make room for each "write"?

Every implementation I've seen does the former.

can I rely on the standard
c++ libraries written for each machine to choose an optimal buffering
scheme and buffer size? it seems like at least some of these details
are implementation dependent... but i don't know which.

No you can't. IO buffering is really done by the operating system and it
is difficult for portable code (like the C++ library) to take full
advantage. I would start with fstream using the default buffering size,
then try pubsetbuf, but if performance is really critical then you might
have to drop fstream and write your own stream classes which can use the
underlying operating system directly.

here's a reference:
http://www.cplusplus.com/ref/iostream/filebuf/
but nobody really discusses this issue that I can find any reference
to.

basically i'm wondering if i need to add the following to my code:
1) choose an optimal buffer size somehow
2) create and give it to the filebuf via pubsetbuf

Before you do any IO, call

file->rdbuf()->pubsetbuf(buffer, buffer_size);
3) figure out how many entries would fill my buffer

then in the process of the calculation
when I know the buffer is full, flush

You don't need to flush. The only time you ever flush is when your
application is crashing and you want to make sure that all output is
written before the crash. Calling flush yourself is going to seriously
degrade performance.

so my question is, how much can i rely on the standard IO library to
handle this for me? i assume that each company who writes an IO library
for their machine would know the best way to handle it, but i just want
to be sure that none of their choices affect me.

As I said, remember that the C++ IO library is likely to be cross
platform. Plus their likely to be writing for a typical application, not
one that needs to write giga bytes of data.
thanks!

Amos.
nitroamos a t y a h o o

john
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top