Reading PODs Cross-Compiler

B

Bill Woessner

Suppose I have a structure, foo, which is a POD. I would like to read
and write it to disk as follows:

std::eek:fstream outs;
foo bar;
outs.write(reinterpret_cast<char*>(&bar), sizeof(foo));
....
std::ifstream ins;
foo bar;
ins.read(reinterpret_cas<char*>(&bar), sizeof(foo));

This works fine if the code doing the reading and writing are compiled
with the same compiler. However, I now have a situation where I would
need to write with one compiler and read with another. Fortunately,
the reading and writing will occur on the same platform, so endianness
is not a problem.

Is there a guaranteed way to accomplish this? I'm guessing the reason
it's not working has to do with padding and byte-alignment. Is there
some way to force the two compilers to agree on padding and byte-
alignment?

Thanks in advance,
Bill
 
A

Andre Kostur

Suppose I have a structure, foo, which is a POD. I would like to read
and write it to disk as follows:

std::eek:fstream outs;
foo bar;
outs.write(reinterpret_cast<char*>(&bar), sizeof(foo));
...
std::ifstream ins;
foo bar;
ins.read(reinterpret_cas<char*>(&bar), sizeof(foo));

This works fine if the code doing the reading and writing are compiled
with the same compiler. However, I now have a situation where I would
need to write with one compiler and read with another. Fortunately,
the reading and writing will occur on the same platform, so endianness
is not a problem.

Is there a guaranteed way to accomplish this? I'm guessing the reason
it's not working has to do with padding and byte-alignment. Is there
some way to force the two compilers to agree on padding and byte-
alignment?

No. That's compiler-specific. You'd need to look that up in each
compiler.

Or, you could write out each piece of the struct, field-by-field, thus
forcing a "packed" representation on disk.

However, you may be better off defining some sort of canonical form for
your data on disk and convert back and forth between that. Today you're
running on the same platform. Tomorrow you may not be.....
 
F

faceman28208

Funny, I posted a inquiry about this very type problem and one of the
responses was that it is never a problem.

Suppose I have a structure, foo, which is a POD. I would like to read
and write it to disk as follows:

What you are describing is very system specific. What you have to do
is read and each field individually and do the appropriate
conversion.
Is there a guaranteed way to accomplish this? I'm guessing the reason
it's not working has to do with padding and byte-alignment. Is there
some way to force the two compilers to agree on padding and byte-
alignment?

Yes, and ---

byte ordering
character format
floating point format
representation of bit fields
 
G

Gianni Mariani

Bill Woessner wrote:
....
Is there a guaranteed way to accomplish this?

Test and see ?
... I'm guessing the reason
it's not working has to do with padding and byte-alignment. Is there
some way to force the two compilers to agree on padding and byte-
alignment?

Not by the C++ standard.

However, I have done exactly what you describe across platforms and
endianness for many different compilers.

Avoid bitfields.

Many (if not most) compilers support #pragma pack(). Even compilers
that require word alignment for reading and writing words allow you to
read and write from structs that contain words that are not aligned.
(albeit using more cycles).

In dealing with endianness I have used a template class - check this out:

http://groups.google.com/group/comp.lang.c++/msg/c73876b9716f3add?hl=en&

Is it guaranteed to work on every platform ever - probably not. It will
more than likely work on every platform you care about for now.

I've used different techniques to serialize/deserialize since I used
this so I can't say that this is the best way to go but if you're
already reading/writing POD structs then this might work for you.

NO guarentees but likely to work and not break for a long long time.
 
J

James Kanze

Suppose I have a structure, foo, which is a POD. I would like to read
and write it to disk as follows:
std::eek:fstream outs;
foo bar;
outs.write(reinterpret_cast<char*>(&bar), sizeof(foo));
...
std::ifstream ins;
foo bar;
ins.read(reinterpret_cas<char*>(&bar), sizeof(foo));
This works fine if the code doing the reading and writing are compiled
with the same compiler. However, I now have a situation where I would
need to write with one compiler and read with another. Fortunately,
the reading and writing will occur on the same platform, so endianness
is not a problem.
Is there a guaranteed way to accomplish this? I'm guessing the reason
it's not working has to do with padding and byte-alignment. Is there
some way to force the two compilers to agree on padding and byte-
alignment?

No. There's not even a means of forcing two different
compilers, or two different versions of the same compiler, to
agree on byte order. Note too that if you are writing the data
to disk, you presumably want to read it later in time.
Including after an upgrade of the system. So you have to write
something that is compatible with all future compilers on as yet
unknown systems as well.

The solution to this is well known. Just define a format (or
use XDR, which is more or less a quasi standard for this sort of
thing), and format the data to whatever format you define. It's
rather simple, in fact (as long as you don't have floating
point), although it does involve writing code.
 
J

James Kanze

Bill Woessner wrote:
Test and see ?

Test what? How do you test that something will work with the
next release of the compiler, or when you upgrade hardware (from
32 bits to 64).

The real problem here, of course, isn't whether to test, but
what to test. He definitly should test the code, once he's
written it. But before even starting to write it, he should
define what it is supposed to do, i.e. the actual format on the
disk. Without doing that, he doesn't know what to test. (A
test for this sort of thing might involve writing known data to
disk, then reading it using something like "od -t x1", and
verifying that the file contains exactly the bytes he expects.)

Obviously, whatever format he chooses, and whatever shortcuts he
chooses to implement it:

-- he needs to document it, so that future programmers will
know what he is doing, and

-- he needs very rigorous tests of the code (especially if he
is cutting corners), so that the code can be validated on
any future platforms.

(As an example of what I mean by cutting corners: if all of his
current target machines use 32 bit 2's complement integers, and
are little endian, then he might decide to use this as his
integral format, knowing that memcpy will format correctly.
While probably not worth the bother for integral types, I
regularly write code which supposes that floats and doubles are
IEEE---formatting the output for IEEE when this is not the case
is a significant amount of work, and as long as my code doesn't
have to be ported to anything but the mainstream Unix machines
or PC's, there's no point in it.)

[...]
Is it guaranteed to work on every platform ever - probably not. It will
more than likely work on every platform you care about for now.

Sort of a superficial point of view, don't you think. You
should at least document the restrictions.
 
G

Gianni Mariani

James Kanze wrote:
....
Sort of a superficial point of view, don't you think. You
should at least document the restrictions.

I suspect you can write a unit test for that as well !
 
G

Gianni Mariani

James said:
No. There's not even a means of forcing two different
compilers, or two different versions of the same compiler, to
agree on byte order. Note too that if you are writing the data
to disk, you presumably want to read it later in time.
Including after an upgrade of the system. So you have to write
something that is compatible with all future compilers on as yet
unknown systems as well.

The solution to this is well known. Just define a format (or
use XDR, which is more or less a quasi standard for this sort of
thing), and format the data to whatever format you define. It's
rather simple, in fact (as long as you don't have floating
point), although it does involve writing code.

I just thought - cool. Let's check it out. I'm somewhat dissapointed.

XDR has some significant issues AFAICT.

a) Limited to 32 bit lengths. Yes - laugh. The last project I needed
to support serialization required greater than 4 gig files. Ran on a 64
bit machine.

b) Why the null on the string when you have a length ? Hugh ?

c) when you have large numbers of small strings you can have a big file.

d) Why the rounding to 4 bytes ? Really, this is a file format.

e) Container type is limited to array. I don't want to have to specify
yet another transformation once the data is ready into internal data
structures. When you're pushing large amounts of data, this can be an
issue.

f) Forward and reverse compatability. I don't see (I could be mistaken)
that an older version of the spec can read a file's data, modify some
values and write out the file and still preserve the data that was not
decipherable.

In the last system I wrote, it was important that the system could be
downgraded and upgraded and data preserved. Using the system we
designed, it was possible, as long as the definition of the serialized
data was never regressive, i.e. nothing in the data spec was backed out.

In a file format specification, it is also important to specify a unique
header and possibly footer. In a system where you have multiple data
formats, it's a type safety check.

In the next system I write, I think I will specify that all integers be
a more compressed format, especially lengths. Most lengths are less
than 247 so wasting 3 extra bytes everywhere means you have a file with
lots of zero bytes. If it's not a file but a packet you're pushing over
a net, it adds up to a significant overhead for some data streams. A
format something like :

0-247 - 1 byte - the value.
first byte is 248 - next 1 byte + 248.
first byte is 249 - next 2 bytes + 256 + 248
first byte is 250 - next 3 bytes + 65536 + 256 + 248
first byte is 251 - next 4 bytes + 2^24 + 65536 + 256 + 248
first byte is 252 - next 5 bytes + 2^32 + 2^24 + 65536 + 256 + 248
first byte is 253 - next 6 bytes + 2^40 + 2^32 + 2^24 + 65536 + 256 + 248
etc ...

- Every number has a single representation (critical).
- Supports encoding 128 bit ints (possibly modify to support very large
numbers)

As for a "language", it's important to have a common data definition
format, however I think that can be much simpler than XDR.

G
 
J

James Kanze

[...]
I just thought - cool. Let's check it out. I'm somewhat dissapointed.

With XDR? I'm not really surprised. It's nothing exceptional.
But it does cover a lot of useful cases, and in cases where it
is sufficient, it's easier to refer to the XDR specification
than to invent and to write your own. And it has the advantage
of being wide spread.

Of course, if you want something more complete, there's always
BER:).
 
G

Gianni Mariani

James said:
James said:
On May 18, 7:59 pm, Bill Woessner <[email protected]> wrote:
[...]
The solution to this is well known. Just define a format (or
use XDR, which is more or less a quasi standard for this sort of
thing), and format the data to whatever format you define. It's
rather simple, in fact (as long as you don't have floating
point), although it does involve writing code.
I just thought - cool. Let's check it out. I'm somewhat dissapointed.

With XDR? I'm not really surprised. It's nothing exceptional.
But it does cover a lot of useful cases, and in cases where it
is sufficient, it's easier to refer to the XDR specification
than to invent and to write your own. And it has the advantage
of being wide spread.

Of course, if you want something more complete, there's always
BER:).

I don't remember why I gave up on ASN.1 ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,201
Messages
2,571,049
Members
47,655
Latest member
eizareri

Latest Threads

Top