What's wrong with std::ifstream::read()?

  • Thread starter Steven T. Hatton
  • Start date
S

Steven T. Hatton

I know of a least one person who believes std::ifstream::read() and
std::eek:fstream::write() are "mistakes". They seem to do the job I want
done. What's wrong with them. This is the code I currently have as a test
for using std::ifstream::read(). Is there anything wrong with the way I'm
getting the file?

#include <vector>
#include <iomanip>
#include <fstream>
#include <iostream>

template<typename Iterator>
std::eek:stream& printHexLine(Iterator start, Iterator stop, std::eek:stream& out)
{
while(start<stop) out
<<std::setw(2)
<<(static_cast<unsigned int>(static_cast<unsigned char>(*start++)))<<"
";
return out;
}

template<typename Container>
std::eek:stream& print(const Container& data, std::eek:stream& out) {
typedef typename Container::const_iterator c_itr;

std::eek:stream hexout(out.rdbuf());
hexout.setf(std::ios::hex, std::ios::basefield);
hexout.fill('0');

c_itr from (data.begin());
c_itr dataEnd (from + data.size());
c_itr end (dataEnd - (data.size()%16));

for(c_itr start = from; start < end; start += 16) printHexLine(start,
start + 16, hexout)<<"\n";

printHexLine(end, dataEnd, hexout)<<"\n";
return out;
}



int main (int argc, char* argv[]) {
std::string filename("fileio");
std::ifstream file(filename.c_str(), std::ios::in|std::ios::binary
std::ios::ate);
std::vector<char>vbuf(file.tellg());
file.seekg(0, std::ios::beg);
file.read(&vbuf[0], vbuf.size());
print(vbuf, std::cout);
return 0;
}
 
M

Maxim Yegorushkin

Steven said:
I know of a least one person who believes std::ifstream::read() and
std::eek:fstream::write() are "mistakes". They seem to do the job I want
done. What's wrong with them. This is the code I currently have as a test
for using std::ifstream::read(). Is there anything wrong with the way I'm
getting the file?
[]

std::vector<char>vbuf(file.tellg());
file.seekg(0, std::ios::beg);
file.read(&vbuf[0], vbuf.size());

You don't need read here. A simple:

std::vector<char>vbuf((istreambuf_iterator<char>(file)),
(istreambuf_iterator<char>()));

Would suffice. Although you might argue that your code does not involve
vector reallocations.
 
S

Steven T. Hatton

Maxim said:
I know of a least one person who believes std::ifstream::read() and
std::eek:fstream::write() are "mistakes". They seem to do the job I want
done. What's wrong with them. This is the code I currently have as a
test
for using std::ifstream::read(). Is there anything wrong with the way
I'm getting the file?
[]

std::vector<char>vbuf(file.tellg());
file.seekg(0, std::ios::beg);
file.read(&vbuf[0], vbuf.size());

You don't need read here. A simple:

std::vector<char>vbuf((istreambuf_iterator<char>(file)),
(istreambuf_iterator<char>()));

Would suffice. Although you might argue that your code does not involve
vector reallocations.

I can reserve space in the vector, and still use the iterator. I don't know
what the exact implications of opening with std::ios_base::ate are. Does
that force the OS to try loading the entire file into memory? I know the
language doesn't specify, and it may well be OS dependent. It seemed to me
the iterator is probably doing a lot of work that really didn't need to be
done. What I really want to do is steal the buffer from the ifstream
rather than copy it.

I don't know of any performance evaluations comparing the different
techniques for reading files, but I do know on one job I did, we had the
largest personnel record system in the world, all in the form of scanned
images. Wasted copying was not a great idea in that context.
 
M

Maxim Yegorushkin

Steven T. Hatton wrote:

[]
I can reserve space in the vector, and still use the iterator. I don't know
what the exact implications of opening with std::ios_base::ate are. Does
that force the OS to try loading the entire file into memory? I know the
language doesn't specify, and it may well be OS dependent. It seemed to me
the iterator is probably doing a lot of work that really didn't need to be
done. What I really want to do is steal the buffer from the ifstream
rather than copy it.

I use memory mapped files for that. A POSIX/win32 implementation is
trivial.
 
S

Steven T. Hatton

Alex said:
[snip]
I don't know of any performance evaluations comparing the different
techniques for reading files
[snip]

Look at

"Comparative Performance Measurement: Reading file into string"
http://groups.google.com/group/perfo/msg/530fae8e5e065030


"Comparative Performance Measurement: Copying files"
http://groups.google.com/group/perfo/msg/8a74465da4c4e9bb
I don't see where you ran this one:

### CPP-23: std::vector and istream::read()
------------------------------------------------
vector<char> v (no_of_file_bytes);
ifs.read(&v[0], no_of_file_bytes);
ret_str = (v.empty() ? string() : string (v.begin(), v.end()));
------------------------------------------------

but it's probably safe to assume it would very closesly match:

### CPP-24: std::string and istream::read()
------------------------------------------------
string tmp (no_of_file_bytes, '0');
ifs.read(&tmp[0], no_of_file_bytes);
ret_str = tmp;
------------------------------------------------

Which outperformed streambuf iterators by between 10 and 30 times, and was
between 400 and 500 times faster than using stream iterators. That seems
to confirm what I suspected in both cases.
 
A

Alex Vinokur

Steven T. Hatton said:
Alex said:
[snip]
I don't know of any performance evaluations comparing the different
techniques for reading files
[snip]

Look at

"Comparative Performance Measurement: Reading file into string"
http://groups.google.com/group/perfo/msg/530fae8e5e065030


"Comparative Performance Measurement: Copying files"
http://groups.google.com/group/perfo/msg/8a74465da4c4e9bb
I don't see where you ran this one:

### CPP-23: std::vector and istream::read()
------------------------------------------------
vector<char> v (no_of_file_bytes);
ifs.read(&v[0], no_of_file_bytes);
ret_str = (v.empty() ? string() : string (v.begin(), v.end()));
------------------------------------------------

but it's probably safe to assume it would very closesly match:

### CPP-24: std::string and istream::read()
[snip]

File file2str-1-0.cpp from
http://groups-beta.google.com/group/sources/msg/874798865afae595

------------- file2str-1-0.cpp : Fragment -------------
Line#

2143
2144 MEASURE_WITH_NO_ARG (CPP_23_txt__vector__cpp_read);
2145 CHECK_TXT_RETURNED_STRING;
2146 MEASURE_WITH_NO_ARG (CPP_23_bin__vector__cpp_read);
2147 CHECK_BIN_RETURNED_STRING;
2148
2149 MEASURE_WITH_NO_ARG (CPP_24_txt__string__cpp_read);
2150 CHECK_TXT_RETURNED_STRING;
2151 MEASURE_WITH_NO_ARG (CPP_24_bin__string__cpp_read);
2152 CHECK_BIN_RETURNED_STRING;
2153
 
S

Steven T. Hatton

Alex Vinokur wrote:

[...]
File file2str-1-0.cpp from
http://groups-beta.google.com/group/sources/msg/874798865afae595

------------- file2str-1-0.cpp : Fragment -------------
Line#

2143
2144 MEASURE_WITH_NO_ARG (CPP_23_txt__vector__cpp_read);
2145 CHECK_TXT_RETURNED_STRING;
2146 MEASURE_WITH_NO_ARG (CPP_23_bin__vector__cpp_read);
2147 CHECK_BIN_RETURNED_STRING;
2148
2149 MEASURE_WITH_NO_ARG (CPP_24_txt__string__cpp_read);
2150 CHECK_TXT_RETURNED_STRING;
2151 MEASURE_WITH_NO_ARG (CPP_24_bin__string__cpp_read);
2152 CHECK_BIN_RETURNED_STRING;
2153

-------------------------------------------------------

I'm getting this on SuSE 9.3:

$ g++ -otext file2str-1-0.cpp
file2str-1-0.cpp: In function `size_t get_filesize_via_lseek(const char*,
bool)
':
file2str-1-0.cpp:294: error: `O_BINARY' undeclared (first use this function)
file2str-1-0.cpp:294: error: (Each undeclared identifier is reported only
once for each function it appears in.)

I tried hacking around it by replacing some of the code, but it looked like
the hole was getting deeper. I'm not sure if, or where the macro is
currently defined. From googling around, it looks like it used to come
from <fcntl.h>.
 
A

Alex Vinokur

Steven said:
I'm getting this on SuSE 9.3:
[snip]


$ g++ -otext file2str-1-0.cpp
file2str-1-0.cpp: In function `size_t get_filesize_via_lseek(const char*,
bool)
':
file2str-1-0.cpp:294: error: `O_BINARY' undeclared (first use this function)
file2str-1-0.cpp:294: error: (Each undeclared identifier is reported only
once for each function it appears in.)

I tried hacking around it by replacing some of the code, but it looked like
the hole was getting deeper. I'm not sure if, or where the macro is
currently defined. From googling around, it looks like it used to come
from <fcntl.h>.
[snip]

O_BINARY is in fcntl.h (on UNIX).

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn
 
S

Steven T. Hatton

Alex said:
I'm getting this on SuSE 9.3:
[snip]


$ g++ -otext file2str-1-0.cpp
file2str-1-0.cpp: In function `size_t get_filesize_via_lseek(const char*,
bool)
':
file2str-1-0.cpp:294: error: `O_BINARY' undeclared (first use this
function) file2str-1-0.cpp:294: error: (Each undeclared identifier is
reported only once for each function it appears in.)

I tried hacking around it by replacing some of the code, but it looked
like
the hole was getting deeper. I'm not sure if, or where the macro is
currently defined. From googling around, it looks like it used to come
from <fcntl.h>.
[snip]

O_BINARY is in fcntl.h (on UNIX).

I'm not sure that is a current requirement for UNIX. SuSE's pretty good at
getting the standards right. It's not there on my box, but there is a
definition of O_BINARY in <kpathsea/c-fopen.h>. When I add that file to
your program, it compiles, but when I run it, I get the following error:

================================================
Simple C/C++ Perfometer : Reading file to string
Version F2S-1.0
================================================


-------------
GNU gcc 3.3.5
-------------

YOUR COMMAND LINE : test 1024 1 1

### File size : 1024
### Number of runs : 1
### Number of tests : 1
### Number of repetitions : 1
### CLOCKS_PER_SEC : 1000000



Run-1 of 1 : Started
User defined file size = 1024
Txt input file size = 1030
Via fseek&ftell file size = 1024
test: file2str-1-0.cpp:1966: void measure(long unsigned int): Assertion
`infile_size2_txt == get_filesize_via_fseek_ftell ("z-txt.in", true)'
failed.
aborted
 
A

Alex Vinokur

Steven said:
Alex said:
Alex Vinokur wrote:

[...]
File file2str-1-0.cpp from
http://groups-beta.google.com/group/sources/msg/874798865afae595
I'm getting this on SuSE 9.3:
[snip]

Hi Steven,

I think our discussion is going to be out of topic in comp.lang.c++
and it is worth continuing it in comp.lang.c++.perfometer.
So, my reply has been sent comp.lang.c++.perfometer and can be seen at
http://groups.google.com/group/perfo/msg/aa8aa965fe5be816
 
S

Steven T. Hatton

Alex said:
Steven said:
Alex said:
Steven T. Hatton wrote:
Alex Vinokur wrote:

[...]
File file2str-1-0.cpp from
http://groups-beta.google.com/group/sources/msg/874798865afae595


I'm getting this on SuSE 9.3:
[snip]

Hi Steven,

I think our discussion is going to be out of topic in comp.lang.c++
and it is worth continuing it in comp.lang.c++.perfometer.
So, my reply has been sent comp.lang.c++.perfometer and can be seen at
http://groups.google.com/group/perfo/msg/aa8aa965fe5be816

-----
Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

For some reason that newsgroup is not on my server.

FWIW:

--- get_filesize_via_fseek_ftell
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_lseek
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_fstat
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_stat
Created in TXT mode : 1
Created in BIN mode : 1

--- get_filesize_via_seekg_tellg
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_distance
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_rdbuf_pubseekoff
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1
 
A

Alex Vinokur

Steven said:
Alex said:
Steven said:
Alex Vinokur wrote:


Steven T. Hatton wrote:
Alex Vinokur wrote:

[...]
File file2str-1-0.cpp from
http://groups-beta.google.com/group/sources/msg/874798865afae595


I'm getting this on SuSE 9.3:
[snip]

Hi Steven,

I think our discussion is going to be out of topic in comp.lang.c++
and it is worth continuing it in comp.lang.c++.perfometer.
So, my reply has been sent comp.lang.c++.perfometer and can be seen at
http://groups.google.com/group/perfo/msg/aa8aa965fe5be816
[snip]
For some reason that newsgroup is not on my server.

comp.lang.c++.perfometer is not on NNTP server.
One worrks with this via WEB-interface:
http://groups-beta.google.com/group/perfo
FWIW:

--- get_filesize_via_fseek_ftell
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_lseek
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_fstat
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_stat
Created in TXT mode : 1
Created in BIN mode : 1

--- get_filesize_via_seekg_tellg
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_distance
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_rdbuf_pubseekoff
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 1
Created in BIN mode, read in TXT mode: 1

Here is output of the same program ("Getting file size" from
http://groups.google.com/group/alt.sources/msg/41464ce8b75f8417 )
produced with g++ 3.3.3 on Cygwin & Windows2000


--- get_filesize_via_fseek_ftell
Created in TXT mode, read in TXT mode: 2
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 2
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_lseek
Created in TXT mode, read in TXT mode: 2
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 2
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_fstat
Created in TXT mode, read in TXT mode: 2
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 2
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_stat
Created in TXT mode : 2
Created in BIN mode : 1

--- get_filesize_via_seekg_tellg
Created in TXT mode, read in TXT mode: 2
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 2
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_distance
Created in TXT mode, read in TXT mode: 1
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 2
Created in BIN mode, read in TXT mode: 1

--- get_filesize_via_rdbuf_pubseekoff
Created in TXT mode, read in TXT mode: 2
Created in BIN mode, read in BIN mode: 1
Created in TXT mode, read in BIN mode: 2
Created in BIN mode, read in TXT mode: 1


So, we can see that different operating systems/hadware produce
different file size for text mode.

I have updated "Simple C/C++ Perfometer: Reading file to string
(Versions 1.x)".
Latest version (F2S-1.0.6) is at
http://groups-beta.google.com/group/sources/msg/27a9b6f91239c909

Alex Vinokur
email: alex DOT vinokur AT gmail DOT com
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top