What does "formatted" I/O really mean?

  • Thread starter Steven T. Hatton
  • Start date
S

Steven T. Hatton

I'm still not completely sure what's going on with C++ I/O regarding the
extractors and inserters. The following document seems a bit inconsistent:
http://gcc.gnu.org/onlinedocs/libstdc++/27_io/howto.html#1

Copying a file:

WRONG WAY:
#include <fstream>
std::ifstream IN ("input_file");
std::eek:fstream OUT ("output_file");
OUT << IN; // undefined behavior

RIGHT WAY:
//[T]he easiest way to copy the file is:
OUT << IN.rdbuf();

HOWEVER:
"First, ios::binary has exactly one defined effect, no more and no less.
Normal text mode has to be concerned with the newline characters, and the
runtime system will translate between (for example) '\n' and the
appropriate end-of-line sequence (LF on Unix, CRLF on DOS, CR on Macintosh,
etc)....

Second, using << to write and >> to read isn't going to work with the
standard file stream classes, even if you use skipws during reading. Why
not? Because ifstream and ofstream exist for the purpose of formatting, not
reading and writing. Their job is to interpret the data into text
characters, and that's exactly what you don't want to happen during binary
I/O.

BUT IT SAID:
[T]he easiest way to copy the file is:
OUT << IN.rdbuf();

Does that only apply to "text" files?

"Third, using the get() and put()/write() member functions still aren't
guaranteed to help you. These are "unformatted" I/O functions, but still
character-based. (This may or may not be what you want, see below.)"

I saw below, but don't know what I was supposed to see. Is it the endian
stuff?

If I open a file in binary mode, then f.rdbuf() >> stringstrm, is the entire
file going to be faithfully represented bit-for-bit in the
std::stringstream? If not, how will it have been changed? Note that I
made no mention of unsetting skipws here.

You may think I'm just had-headed, and can't understand that I shouldn't use
the overloaded shift operators for unformatted data. Well, suppose someone
else were to do that, and it worked for them. Is there a potential that it
could cause problems for me? I certainly have used the above method to
read "raw" data in the past.

Also, I _do_ want to "format" the data. I want to parse and ELF file into
its elementary components.
 
E

Earl Purple

Steven said:
HOWEVER:
"First, ios::binary has exactly one defined effect, no more and no less.
Normal text mode has to be concerned with the newline characters, and the
runtime system will translate between (for example) '\n' and the
appropriate end-of-line sequence (LF on Unix, CRLF on DOS, CR on Macintosh,
etc)....

Yes, that is indeed the case. It does not mean that if you stream with
operator said:
Second, using << to write and >> to read isn't going to work with the
standard file stream classes, even if you use skipws during reading. Why
not? Because ifstream and ofstream exist for the purpose of formatting, not
reading and writing. Their job is to interpret the data into text
characters, and that's exactly what you don't want to happen during binary
I/O.

Correct, but streambuf is there underneath as no more than an array of
characters.
BUT IT SAID:
[T]he easiest way to copy the file is:
OUT << IN.rdbuf();

If you can get the length of the buffer you can also use write() which
is used for binary I/O. You must beware of one thing though - those
nasty char_traits. I was using a basic_streambuf< unsigned char > for
binary I/O and found some characters missing. It turned out it was
randomly removing 0xff characters after interpreting them as "EOF". So
I had to write my own char_traits for unsigned char and attach that to
my stream as my second template parameter (thus basic_iostream<
unsigned char, uchtraits > where uchtraits is my own "traits" class).
Then it worked.
"Third, using the get() and put()/write() member functions still aren't
guaranteed to help you. These are "unformatted" I/O functions, but still
character-based. (This may or may not be what you want, see below.)"

They are based in characters that have traits. You are not forced to
I saw below, but don't know what I was supposed to see. Is it the endian
stuff?

Nothing to do with endian stuff, except that if you used basic_fstream
(basic_iostream) on a character type of 2 bytes or more to write
integers then endian stuff might come into play. (One reason why
wchar_t is generally not used as a character. Instead one-byte
characters and codepages are used).
If I open a file in binary mode, then f.rdbuf() >> stringstrm, is the entire
file going to be faithfully represented bit-for-bit in the
std::stringstream? If not, how will it have been changed? Note that I
made no mention of unsetting skipws here.

There is no operator>> overload for basic_streambuf/filebuf.
You may think I'm just had-headed, and can't understand that I shouldn't use
the overloaded shift operators for unformatted data. Well, suppose someone
else were to do that, and it worked for them. Is there a potential that it
could cause problems for me? I certainly have used the above method to
read "raw" data in the past.

Your own objects can use operator>> and operator<< in whatever way they
want, writing in binary format if they choose. They do not need to be
humanly readable.
Also, I _do_ want to "format" the data. I want to parse and ELF file into
its elementary components.

Then get your objects to format binary data. How is STL supposed to
know your format? If you format your data to be a fixed size then use
read() and write(). If a variable size then put a "header" section
inside and resolve any endian issues by enforcing one particular endian
notation. (Normally I would choose big-endian unless you are going to
primarily be working on a little-endian system and can optimise for
that system).
 
D

Dietmar Kuehl

Steven said:
Second, using << to write and >> to read isn't going to work with the
standard file stream classes, even if you use skipws during reading. Why
not? Because ifstream and ofstream exist for the purpose of formatting,
not reading and writing. Their job is to interpret the data into text
characters, and that's exactly what you don't want to happen during binary
I/O.

BUT IT SAID:
[T]he easiest way to copy the file is:
OUT << IN.rdbuf();

Does that only apply to "text" files?

No, it applies to all files. The thing here is that this output operator
considers the whole sequences of characters produced by 'IN.rdbuf()' as
one (unaltered) sequence of text characters.
"Third, using the get() and put()/write() member functions still aren't
guaranteed to help you. These are "unformatted" I/O functions, but still
character-based. (This may or may not be what you want, see below.)"

I saw below, but don't know what I was supposed to see. Is it the endian
stuff?

I don't know for sure what they wanted to refer at, except maybe the
discussion about binary formatted I/O. The actual issue which I haven't
seen addressed on this page is that the bytes in the file are converted
into characters by processing them in a locale specific way. If you want
to do binary I/O you need this conversion to have no effect. This is
done by selecting the "C" locale.
If I open a file in binary mode, then f.rdbuf() >> stringstrm, is the
entire file going to be faithfully represented bit-for-bit in the
std::stringstream?

No, it is not: First of all, there is no overload for 'operator>>()'
taking a stream buffer as first argument and a stream as second
argument. Assuming you wanted to write 'f >> stringstream.rdbuf()'
this still does not work because formatted input operators start by
skipping white space unless 'skipws' is turned off. It works the other
way around, though: 'stringstream << f.rdbuf()' (assuming 'stringstream'
is a variable of an appropriate type, e.g. 'std::eek:stringstream').
You may think I'm just had-headed, and can't understand that I shouldn't
use the overloaded shift operators for unformatted data. Well, suppose
someone else were to do that, and it worked for them.

You can use essentially just one of the predefined shift operators
reasonably for binary I/O and this is the output operator taking a
stream buffer pointer. Everything else is too error prone, IMO, to
be used reasonably although you can go a long way to come close to a
working implementation. For example, you could even make the inserters
and extractors for numeric types work on a binary format by creating
appropriate 'num_put' and 'num_get' facets. This is, however, not their
intended purpose and there are still sufficient problems left. It is
easier to create a new stream hierarchy for binary I/O and it avoids
a bunch of pitfalls (e.g. to forget to unset 'skipws' or accidental
use of operators for text formatted I/O).
I certainly have used the above method to read "raw" data in the past.

If I had to read the whole file into a container, I would read "raw"
data like this:

std::vector<char> data((std::istreambuf_iterator<char>(in)),
std::istreambuf_iterator<char>());

Of course, this tends to be pretty slow because it requires a certain
optimization to be in place which is typically not implemented. Thus,
I'm using streams for a binary format.
Also, I _do_ want to "format" the data. I want to parse and ELF file
into its elementary components.

Yes, the distinction between formatted and unformatted does not really
fit well for this. The difference is between text formatted and binary
formatted. You want a binary format. This is not directly supported by
the IOStreams hierarchy although you can still use the stream buffer
hierarchy for the actual reading/writing from a file. You just have to
take care of opening the files in binary mode and suppressing any
character conversions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top