reading files in blocks

Urs Thuermann · Dec 9, 2010

I want to read blocks of a file in a loop and I am looking for the
most elegant way to do this. Assume there is a function
process_data() that I don't want to call with zero-length data. In C
I write

while ((nbytes = read(fd, buffer, sizeof(buffer))) > 0) {
process_data(buffer, nbytes);
}

AFAICS, in C++ I should use ifstream::read() for reading from the
file, but that method does not return the number of bytes successfully
read. Unfortunately, like feof(fp) in C, the method ifstream::eof()
does signal the EOF state of the file one byte too late, i.e. only
*after* trying to read one more byte after the last one has been read,
instead of before. Therefore, I cannot write

while (!file.eof()) {
file.read(buffer, sizeof(buffer));
nbytes = file.gcount();
process_data(buffer, nbytes);
}

but instead one must write

while (!file.eof()) {
file.read(buffer, sizeof(buffer));
nbytes = file.gcount();
if (nbytes == 0) // or
break; // if (nbytes > 0)
process_data(buffer, nbytes); // process_data(buffer, nbytes);
}

which I find ugly because of the two loop termination conditions in
the first case (left) and because of the two almost identical
conditions and the additional indentation in the second case (right).

Currently, I have the following, which comes closest to what I've
written hundred of times in C, but does not look quite as clean and
simple:

while (file.read(buffer, sizeof(buffer)), (nbytes = file.gcount()) > 0) {
process_data(buffer, nbytes);
}

So, my question is if there is another simple and elegant way to do this.

urs

u2 · Dec 9, 2010

I want to read blocks of a file in a loop and I am looking for the
most elegant way to do this. Assume there is a function
process_data() that I don't want to call with zero-length data. In C
I write

while ((nbytes = read(fd, buffer, sizeof(buffer))) > 0) {
process_data(buffer, nbytes);
}

AFAICS, in C++ I should use ifstream::read()

In C++, the while loop above is fine. It would be natural, if you
processed "raw" data.

James Kanze · Dec 10, 2010

I want to read blocks of a file in a loop and I am looking for the
most elegant way to do this. Assume there is a function
process_data() that I don't want to call with zero-length data. In C
I write

while ((nbytes = read(fd, buffer, sizeof(buffer))) > 0) {
process_data(buffer, nbytes);
}

Not in standard C: read is a Posix function. Still, for raw
data, I tend to use the system specific low level accesses,
rather than iostream (or FILE* in C). Both FILE* and iostream
are really designed for streaming text. Both have added support
for random access and "binary", but in but cases, the support is
just that: added on.

Another alternative with iostream is to get the streambuf, and
call sgetn on it. (Note that doing so will *not* set any of the
error bits in the istream.)

AFAICS, in C++ I should use ifstream::read() for reading from the
file, but that method does not return the number of bytes successfully
read.

To get the number of bytes read by the last unformatted read,
use gcount. Don't forget that not reading the asked for number
of bytes is treated as an error condition. To read the entire
file, where the last block may not be complete:

while (in.read(...) || in.gcount() != 0) {
// read n.gcount() bytes...
}

Unfortunately, like feof(fp) in C, the method ifstream::eof()
does signal the EOF state of the file one byte too late, i.e. only
*after* trying to read one more byte after the last one has been read,
instead of before.

I think you're confusing ios::eof() with something else. Until
a read has failed ios::eof() is more or less useless; you can't
count on its state one way or another.

Therefore, I cannot write

while (!file.eof()) {
file.read(buffer, sizeof(buffer));
nbytes = file.gcount();
process_data(buffer, nbytes);
}

but instead one must write

while (!file.eof()) {
file.read(buffer, sizeof(buffer));
nbytes = file.gcount();
if (nbytes == 0) // or
break; // if (nbytes > 0)
process_data(buffer, nbytes); // process_data(buffer, nbytes);
}

which I find ugly because of the two loop termination conditions in
the first case (left) and because of the two almost identical
conditions and the additional indentation in the second case (right).

The standard pattern (both in C and in C++) is to read, then
check whether it has succeeded. In C++, there is an implicit
conversion of the stream object to something which can be used
as a boolean, and all input functions return a reference to the
stream object, so you can just write:

while (file >> target) ...

In the case of istream::read, you have to take the additional
actions I explained above, because istream::read considers
reading anything less than the number of bytes requested an
error.

Currently, I have the following, which comes closest to what I've
written hundred of times in C, but does not look quite as clean and
simple:

while (file.read(buffer, sizeof(buffer)), (nbytes = file.gcount()) > 0) {
process_data(buffer, nbytes);
}

So, my question is if there is another simple and elegant way
to do this.

Replace the comma operator with ||, and you're good. This is
the idiomatic solution. Alternatively, with streambuf:

unsigned byteCount = file.rdbuf()->sgetn(buffer, sizeof(buffer));
while (byteCount != 0) {
processData(buffer, byteCount);
byteCount = file.rdbuf()->sgetn(buffer, sizeof(buffer));
}
file.setstate(std::ios::eof | std::ios::fail);

(The last line is only necessary if there is a possibility of
some other code using the istream later.)

Or you could just create the filebuf directly, and skip the
istream completely.

red floyd · Dec 10, 2010

Or, if OP thinks the two test while loop is ugly, he can always
abstract it away:

int read(std::istream& f, void *buffer, int bufsize)
{
int nbytes = bufsize;
if(f.read(static_cast<char*>buffer, bufsize) ||
(nbytes = file.gcount()) > 0)
return nbytes;
else if (f.eof())
return 0;
else
return -1;

}

// ...

int nbytes;
while ((nbytes = read(f,buffer,sizeof(buffer)) > 0)
process_data(buffer, nbytes);

C pipe	1	Dec 9, 2021
Reading into a buffer and writing from it at the same time	8	Oct 14, 2009
Reading Ports (uart) in windows	3	Jan 27, 2023
Communicating between processes	0	May 14, 2023
Access violation reading location	0	Oct 23, 2022
Child processes don't get the close on pipe	3	Jun 2, 2012
Do successfull poll(), sysread(), syswrite() calls clear %!	2	May 14, 2010
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023

reading files in blocks

Urs Thuermann

u2

James Kanze

red floyd

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads