reading binary file

U

Use*n*x

Hello,

I have a binary file (image file) and am reading 4-bytes at a time. The
File size is 63,480,320 bytes. My assumption is that if I loop through
this file reading 4 bytes at a time, I should loop 15,870,080 times.

The code is:

newprogram.cpp
=============
#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

file.seekg (1,ios::beg);
while (!file.eof())
{
file.read(memblock, 4);
counter++;
}
cout << "Number of loops: " << counter << "\n";
delete[] memblock;
file.close();

return 0;
}

$> g++ newprogram.cpp -o deletelater
$> ./deletelater
Number of loops: 15870080

(a) Notice the file.seekg (1,ios::beg); statement. Is this correct?
(b) If I were to use file.seekg (0,ios::beg); statement, the number of
loops would end up 15870081. Is file.seekg(0,ios::beg) correct? If so,
could you please help me understand why the loop goes 15870081 times?

If I were to use a variable: int tempdata;
and in the loop right after file.read, were to insert: tempdata = (int)
(*memblock), I would get different results with (a)
file.seekg(1,ios::beg) and (b) file.seekg(0,ios::beg). Which one would
be correct?

Your suggestions will be very helpful. Thank you.

Use*n*x
 
M

Micah Cowan

Use*n*x said:
#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

file.seekg (1,ios::beg);
while (!file.eof())
{
file.read(memblock, 4);
counter++;
}
cout << "Number of loops: " << counter << "\n";
delete[] memblock;
file.close();

return 0;
}

$> g++ newprogram.cpp -o deletelater
$> ./deletelater
Number of loops: 15870080

(a) Notice the file.seekg (1,ios::beg); statement. Is this correct?
(b) If I were to use file.seekg (0,ios::beg); statement, the number of
loops would end up 15870081. Is file.seekg(0,ios::beg) correct? If so,
could you please help me understand why the loop goes 15870081 times?

If I were to use a variable: int tempdata;
and in the loop right after file.read, were to insert: tempdata = (int)
(*memblock), I would get different results with (a)
file.seekg(1,ios::beg) and (b) file.seekg(0,ios::beg). Which one would
be correct?

Your loop appears to be written under a common, but false,
misconception.

file.eof() does not return whether or not you are at the end of a file;
it returns whether or not you've attempted to read past the end of the
file. Additionally, an istream does not even know whether it's reached
the end-of-file until it tries to read past the end of the file.

Without a seek, or with a seek to 0, what happens is that you have
15870080 successful reads. After those, file.eof() still returns false;
not because it's not at the end of the file (it is), but because it
hasn't yet tried to read past the end of the file. The very next
(15870081st) read will fail (read zero bytes), and /then/ the eof bit
will be set; but counter will still be incremented.

The reason why seeking to 1 appears to give the right number of reads,
is that you are skipping the first byte (in position 0). Then follows
15870079 successful reads, followed by the 15870080th read that only
reads the final 3 bytes. It tries to read the fourth byte, and at that
point encounters the end-of-file, so it sets the bit, and the test
condition terminates the loop. But you have missed the first byte, and
the final byte you /think/ you read (at memblock[3]) actually is just a
duplicate from the read just before the last one.

Another problem with your loop is that if there were a read /failure/,
your loop would continue indefinitely, as the eof bit would never get
set, and the read calls would just keep failing undected.

The solution? Make the loop condition simply "while (file)" or "while
(file.good())", and check the return value from file.read() before
assuming that it filled your array completely (or at all). Only
increment the counter if the read was successful (I have no clue how
you might want to handle a partial read, but you should take note of
them if they occur).
 
U

Use*n*x

Use*n*x said:
Hello,

I have a binary file (image file) and am reading 4-bytes at a time. The
File size is 63,480,320 bytes. My assumption is that if I loop through
this file reading 4 bytes at a time, I should loop 15,870,080 times.

The code is:

newprogram.cpp
=============
#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

file.seekg (1,ios::beg);
while (!file.eof())
{
file.read(memblock, 4);
counter++;
}
cout << "Number of loops: " << counter << "\n";
delete[] memblock;
file.close();

return 0;
}

$> g++ newprogram.cpp -o deletelater
$> ./deletelater
Number of loops: 15870080

(a) Notice the file.seekg (1,ios::beg); statement. Is this correct?
(b) If I were to use file.seekg (0,ios::beg); statement, the number of
loops would end up 15870081. Is file.seekg(0,ios::beg) correct? If so,
could you please help me understand why the loop goes 15870081 times?

If I were to use a variable: int tempdata;
and in the loop right after file.read, were to insert: tempdata = (int)
(*memblock), I would get different results with (a)
file.seekg(1,ios::beg) and (b) file.seekg(0,ios::beg). Which one would
be correct?

Your suggestions will be very helpful. Thank you.

Use*n*x


I was testing a little more and found this method to be more reliable
than using file.eof(). Suggestions and comments are more than welcome.

#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

long begin,end,filesize,i;


//ifstream file ("179060_mar_05_00_L7.024",
ios::in|ios::binary);
ifstream file ("test", ios::in|ios::binary);
ofstream dump ("dump", ios::binary);

// find file size
begin = file.tellg();
file.seekg(0,ios::end);
end = file.tellg();
filesize = end - begin;

// reposition
file.seekg(0,ios::beg);

// loop
for (i=0; i<filesize; i=i+4)
{
file.read(memblock,4);
// not quite needed
// cout<< memblock << ".." << file.tellg() << endl;
counter++;
}

/*file.seekg (0,ios::beg);
while (!file.eof())
{
file.read(memblock, 4);
//dump << memblock;
cout << memblock << ".." << file.tellg() << endl;
counter++;
}*/
cout << "Number of loops: " << counter << "\n";
delete[] memblock;
file.close();
dump.close();

return 0;
}
 
U

Use*n*x

Micah said:
Use*n*x said:
#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

file.seekg (1,ios::beg);
while (!file.eof())
{
file.read(memblock, 4);
counter++;
}
cout << "Number of loops: " << counter << "\n";
delete[] memblock;
file.close();

return 0;
}

$> g++ newprogram.cpp -o deletelater
$> ./deletelater
Number of loops: 15870080

(a) Notice the file.seekg (1,ios::beg); statement. Is this correct?
(b) If I were to use file.seekg (0,ios::beg); statement, the number of
loops would end up 15870081. Is file.seekg(0,ios::beg) correct? If so,
could you please help me understand why the loop goes 15870081 times?

If I were to use a variable: int tempdata;
and in the loop right after file.read, were to insert: tempdata = (int)
(*memblock), I would get different results with (a)
file.seekg(1,ios::beg) and (b) file.seekg(0,ios::beg). Which one would
be correct?

Your loop appears to be written under a common, but false,
misconception.

file.eof() does not return whether or not you are at the end of a file;
it returns whether or not you've attempted to read past the end of the
file. Additionally, an istream does not even know whether it's reached
the end-of-file until it tries to read past the end of the file.

Without a seek, or with a seek to 0, what happens is that you have
15870080 successful reads. After those, file.eof() still returns false;
not because it's not at the end of the file (it is), but because it
hasn't yet tried to read past the end of the file. The very next
(15870081st) read will fail (read zero bytes), and /then/ the eof bit
will be set; but counter will still be incremented.

The reason why seeking to 1 appears to give the right number of reads,
is that you are skipping the first byte (in position 0). Then follows
15870079 successful reads, followed by the 15870080th read that only
reads the final 3 bytes. It tries to read the fourth byte, and at that
point encounters the end-of-file, so it sets the bit, and the test
condition terminates the loop. But you have missed the first byte, and
the final byte you /think/ you read (at memblock[3]) actually is just a
duplicate from the read just before the last one.

Another problem with your loop is that if there were a read /failure/,
your loop would continue indefinitely, as the eof bit would never get
set, and the read calls would just keep failing undected.

The solution? Make the loop condition simply "while (file)" or "while
(file.good())", and check the return value from file.read() before
assuming that it filled your array completely (or at all). Only
increment the counter if the read was successful (I have no clue how
you might want to handle a partial read, but you should take note of
them if they occur).

Your explanation makes good sense. Thank you.
 
G

Gianni Mariani

Use*n*x said:
Hello,

I have a binary file (image file) and am reading 4-bytes at a time. The
File size is 63,480,320 bytes. My assumption is that if I loop through
this file reading 4 bytes at a time, I should loop 15,870,080 times.

The code is:

newprogram.cpp
=============
#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

file.seekg (1,ios::beg);
while (!file.eof())
{
file.read(memblock, 4);
counter++;
}
cout << "Number of loops: " << counter << "\n";
delete[] memblock;
file.close();

return 0;
}

This is awfully inefficient.

Try this:

#include <iostream>
#include <fstream>

using namespace std;

int main ()
{

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

streambuf * pbuf = file.rdbuf();
int l_blocks[1024];

streamsize i;

while (
i = pbuf->sgetn(
reinterpret_cast<char*>(l_buffer), sizeof(l_buffer) )
)
{
streamsize num_read = i / sizeof(int);

for ( streamsize x = 0; x < num_read; ++ x )
{
PROCESS_THIS_THING( l_blocks[x] );
}
}

return 0;
}

Come to think of it, I have not checked the performance of the C++
stream library lately so I could be wrong. However, I have found that
frequent calls can significantly slow down the application, especially
when you're reading large chunks of data.

If you don't care about peformance, then you can stick with what you
have. I would loose the new/delete tho. Just declare a small array on
the stack and make sure you never read more bytes than you have allocated.
 
M

Micah Cowan

Use*n*x said:
I was testing a little more and found this method to be more reliable
than using file.eof(). Suggestions and comments are more than welcome.

#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

long begin,end,filesize,i;


//ifstream file ("179060_mar_05_00_L7.024",
ios::in|ios::binary);
ifstream file ("test", ios::in|ios::binary);
ofstream dump ("dump", ios::binary);

// find file size
begin = file.tellg();
file.seekg(0,ios::end);
end = file.tellg();
filesize = end - begin;

// reposition
file.seekg(0,ios::beg);

// loop
for (i=0; i<filesize; i=i+4)
{
file.read(memblock,4);
// not quite needed
// cout<< memblock << ".." << file.tellg() << endl;
counter++;
}

<snipped the rest>

You're much safer using std::streampos from <iosfwd> to store the
result of file.tellg(), as it could well have a greater width than a
long, and you might not detect a potential overflow.

Other than that, you're still a lot better of checking for eof(): if a
read failure occurs, your code above still won't catch it, and if some
outside program were to truncate the file before you were through
reading it, you don't detect that condition either. Also, it is
possible for the current location to not be able to fit into a
streampos, or to otherwise fail, in which case seekg() will return
streampos(streamoff(-1)), and your code won't work as you expect.

Also: I snipped a section where you use /* ... */ to comment out a
block of code. While that works in your specific case, it's a habit to
be avoided in general, as what if that block of code had a /* */
comment of its own? Those comments don't nest, and you'd have a syntax
error. It's easier in the long run just to get in the habit of using
#if 0 instead.
 
U

Use*n*x

Gianni said:
Use*n*x said:
Hello,

I have a binary file (image file) and am reading 4-bytes at a time. The
File size is 63,480,320 bytes. My assumption is that if I loop through
this file reading 4 bytes at a time, I should loop 15,870,080 times.

The code is:

newprogram.cpp
=============
#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

file.seekg (1,ios::beg);
while (!file.eof())
{
file.read(memblock, 4);
counter++;
}
cout << "Number of loops: " << counter << "\n";
delete[] memblock;
file.close();

return 0;
}

This is awfully inefficient.

Try this:

#include <iostream>
#include <fstream>

using namespace std;

int main ()
{

ifstream file ("179060_mar_05_00_L7.024", ios::in|ios::binary);

streambuf * pbuf = file.rdbuf();
int l_blocks[1024];

streamsize i;

while (
i = pbuf->sgetn(
reinterpret_cast<char*>(l_buffer), sizeof(l_buffer) )
)
{
streamsize num_read = i / sizeof(int);

for ( streamsize x = 0; x < num_read; ++ x )
{
PROCESS_THIS_THING( l_blocks[x] );
}
}

return 0;
}

Come to think of it, I have not checked the performance of the C++
stream library lately so I could be wrong. However, I have found that
frequent calls can significantly slow down the application, especially
when you're reading large chunks of data.

If you don't care about peformance, then you can stick with what you
have. I would loose the new/delete tho. Just declare a small array on
the stack and make sure you never read more bytes than you have allocated.

Good to know your thoughts. It helped. Thank you.
 
U

Use*n*x

Micah said:
Use*n*x said:
I was testing a little more and found this method to be more reliable
than using file.eof(). Suggestions and comments are more than welcome.

#include <iostream>
#include <fstream>
using namespace std;

int main ()
{
int counter=0;
char * memblock;
memblock = new char [4];

long begin,end,filesize,i;


//ifstream file ("179060_mar_05_00_L7.024",
ios::in|ios::binary);
ifstream file ("test", ios::in|ios::binary);
ofstream dump ("dump", ios::binary);

// find file size
begin = file.tellg();
file.seekg(0,ios::end);
end = file.tellg();
filesize = end - begin;

// reposition
file.seekg(0,ios::beg);

// loop
for (i=0; i<filesize; i=i+4)
{
file.read(memblock,4);
// not quite needed
// cout<< memblock << ".." << file.tellg() << endl;
counter++;
}

<snipped the rest>

You're much safer using std::streampos from <iosfwd> to store the
result of file.tellg(), as it could well have a greater width than a
long, and you might not detect a potential overflow.

I started off using streamsize/streampos, but was not quite confident
of what I was doing. So switched back to something that was similar to
a sample code I had on hand.
Other than that, you're still a lot better of checking for eof(): if a
read failure occurs, your code above still won't catch it, and if some
outside program were to truncate the file before you were through
reading it, you don't detect that condition either. Also, it is
possible for the current location to not be able to fit into a
streampos, or to otherwise fail, in which case seekg() will return
streampos(streamoff(-1)), and your code won't work as you expect.

Yes, that is what I should do - keep a tab on eof() to handle failure
in IO.
Also: I snipped a section where you use /* ... */ to comment out a
block of code. While that works in your specific case, it's a habit to
be avoided in general, as what if that block of code had a /* */
comment of its own? Those comments don't nest, and you'd have a syntax
error. It's easier in the long run just to get in the habit of using
#if 0 instead.

Oh yes, I didn't even realize that. Thank you for your valuable inputs.
I have a long way to go in C++.

Use*n*x
 
M

Micah Cowan

Use*n*x said:
Yes, that is what I should do - keep a tab on eof() to handle failure
in IO.

Actually, eof() won't report I/O failure(), you should check
bool(file), or file.good(), which handles all of eof, failure, and
bad-state.

It is sometimes useful to call file.exceptions(<some std::iostate
values>), to cause the stream to throw an exception upon failure or
corruption.
 
B

BobR

Use*n*x wrote in message ...
Yes, that is what I should do - keep a tab on eof() to handle failure
in IO.
Use*n*x


Huh?

int main (){
// using namespace std;
int counter=0;
char *memblock( new char[4] );
std::ifstream file( "test", std::ios::in | std::ios::binary );

while( file.read( memblock, 4 ) ){
counter++;
}

std::cout << "Number of loops: " << counter << "\n";
delete[] memblock;

if( not file ){
std::cout<<" file error="<<file.flags()<<std::endl;
std::cout<<" ios::good="<<file.good()<<std::endl;
std::cout<<" ios::bad="<<file.bad()<<std::endl;
std::cout<<" ios::eof="<<file.eof()<<std::endl;
std::cout<<" ios::fail="<<file.fail()<<std::endl;
}

file.clear();
file.seekg( 0, std::ios::end );
long long end = file.tellg();
long long filesize = end / 4;
std::cout << "long long end = file.tellg(): "<<end<< "\n";
std::cout << "long long filesize = end / 4: "<<filesize<< "\n";

file.close();
return 0;
}

// - output -
// Number of loops: 3
// file error=4098
// ios::good=false
// ios::bad=false
// ios::eof=true
// ios::fail=true
// long long end = file.tellg(): 14
// long long filesize = end / 4: 3
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top