Saving and reloading a container to/from disk

A

Andrew Poelstra

What would this buy him? He's saving the data to file, not to paper
or a text-only medium like Usenet. Apart from being able to print it,
base64 has exactly the same weaknesses as whatever binary representation
lies under the surface.

It restricts what characters are available as data, so that you
can use others as field separators, metadata, etc. This was also
my rationale for translating NULs.

If you write plain old binary data to a file, you have no way of
telling when one piece of data ends and another begins, what the
endianess is, what version of encoder was used, etc, etc.

And for debugging purposes text-readability is often very useful.
To the original poster, I have no general answer. I'd recommend some
format suited to his application, not to its current implementation.

Right now Jacob is working on building an extensive data structure
library for the C language and I suspect he is looking for the most
intelligent and most general method of serialization.

(I also suspect that no such thing exists.).
 
A

Andrew Poelstra

Hi

What would be the best way to save and reload later a container
to/from disk in C++?

Thanks

Here's a solution, what constitutes "best" depends on what you
consider important:

If the container contains Plain-Old-Data types or pointers to PODs --
first fread()/fwrite() the size() [if its variable]. After that just
iterate over the elements, and fread() / fwrite() the data,
dereferencing as appropriate as you go. If you've got pointers to
polymorphic types, make sure their base type has a [virtual]
serialize(), and each derived types implementation includes enough
extra header information the first element to determine its type

As soon as you start using fread for non-raw-data types, you run
into non-portabilities with endianess, primitive sizes, struture
alignment, trap representations, etc, etc.
 
J

jacob navia

jacob navia a écrit :
Ian Collins a écrit :

Thanks.

I downloaded the boost libraries, and compiled them in my machine.
Unziped the source code is 269MB. Compilation took 14 minutes.

Machine Mac-pro OS X with 8 CPUs and 12GB RAM.

Then, I compiled the example of the serialization.
The class being saved/restored looks like this:

original schedule
6:24 bob
0x0x100200440 34?135'52.56" 134?22'78.3" 24th Street and 10th Avenue
0x0x1002004d0 35?137'23.456" 133?35'54.12" State street and Cathedral
Vista Lane
0x0x100200530 35?136'15.456" 133?32'15.3" White House

when restored, the restored stuff looks like this:
6:24
0x0x100200e30 34?135'52.56" 134?22'78.3" 24th Street and 10th Avenue
0x0x100200f40 35?137'23.456" 133?35'54.12" State street and Cathedral
Vista Lane
0x0x1002012c0 35?136'15.456" 133?32'15.3" White House


As you can see the name "bob" is missing.

The same bug appears with all other saved/restored class instances.
I am not fluent in C++ to figure out this, sorry.

But maybe this is a bug in the example, I can't determine what
is the reason.

jacob

The reason was a bug in the demo.cpp example.

When I comment out line 233 the example works. This means that
the bug is in the demo.cpp program and not in the library.

Line 233 reads

//: if(file_version >= 2)

When commented out the program works... I do not know if this is the
right fix, what I wanted to know is that the libraries are working
and the bug is in the example.
 
J

James Kanze

On 2010-04-01, gwowen <[email protected]> wrote:

[...]
As soon as you start using fread for non-raw-data types, you run
into non-portabilities with endianess, primitive sizes, struture
alignment, trap representations, etc, etc.

As soon as you use fread for anything but char, you run into
such problems. In practice, the only sensible use of fread is
for reading char (or unsigned char) buffers, which you unformat
manually. (C++ recognized this fact, and istream::read takes a
char*, rather than a void*.)
 
J

jacob navia

James Kanze a écrit :
[...]
As soon as you start using fread for non-raw-data types, you run
into non-portabilities with endianess, primitive sizes, struture
alignment, trap representations, etc, etc.

As soon as you use fread for anything but char, you run into
such problems. In practice, the only sensible use of fread is
for reading char (or unsigned char) buffers, which you unformat
manually. (C++ recognized this fact, and istream::read takes a
char*, rather than a void*.)

Sure, then if we start like that, using the disk drive is non portable.

I did not know that C++ doesn't like using the disk drive...

I mean, ANYTHING you write into the disk is nonportable. You should
then use XML so that your code slows down to ridiculous speed because
of the overhead, your network breaks down because of the traffic
overhead...

Are you seriously saying that writing binary data to the disk shouldn't
be done or what?

And anyway, even if you use XML, writing any floating point
data looses precision because of rounding problems

jacob
 
K

Kai-Uwe Bux

jacob navia wrote:

[...]
And anyway, even if you use XML, writing any floating point
data looses precision because of rounding problems
[...]

I don't know XML. Therefore, I am just curious: does XML limit the number of
digits you can put into a decimal?

If not, then you could actually exactly represent each finite precision
floating point number in base 2 since 2 is a divisor of 10. More
importantly, though, you can stop as soon as the decimal allows
reconstructing the floating point number (e.g., as the unique closest value
of the underlying floating point type). If I recall correctly,
boost::lexical_cast does something like that to guarantee correct round trip
behavior for casting floats.


Best

Kai-Uwe Bux
 
I

Ian Collins

James Kanze a écrit :
Sure, then if we start like that, using the disk drive is non portable.

I did not know that C++ doesn't like using the disk drive...

I mean, ANYTHING you write into the disk is nonportable. You should
then use XML so that your code slows down to ridiculous speed because
of the overhead, your network breaks down because of the traffic
overhead...

Are you seriously saying that writing binary data to the disk shouldn't
be done or what?

Are you seriously misunderstanding what James wrote, or are you
deliberately misinterpreting it?

If you want to serialise objects to and from disk, you use a character
buffer as the intermediate step. The format you use is up to you.
 
J

Jeff Flinn

jacob said:
jacob navia a écrit :

The reason was a bug in the demo.cpp example.

When I comment out line 233 the example works. This means that
the bug is in the demo.cpp program and not in the library.

Line 233 reads

//: if(file_version >= 2)

When commented out the program works... I do not know if this is the
right fix, what I wanted to know is that the libraries are working
and the bug is in the example.

Well, I've developed several commercial applications using boost
serialization that serialize without problem.

I don't believe the proper fix is to comment out the above if(), rather
the line:

BOOST_CLASS_VERSION(bus_schedule, 2)

should be

BOOST_CLASS_VERSION(bus_schedule::trip_info, 2)

But I'm not positive if that's the right syntax for setting the class
version on a nested class.

Jeff
 
R

Rui Maciel

Andrew said:
I'm not sure how best to deal with binary data, but for
numbers and text, I would use JSON (escaping special
characters as appropriate, etc).

You are selling JSON a bit short. Numbers and text can be easily handled by the most basic
data interchange formats conceived, including the infamous INI file format. JSON thankfully
offers quite a bit more than that.


Rui Maciel
 
R

Rui Maciel

jacob said:
Hi

What would be the best way to save and reload later a container
to/from disk in C++?

I believe that the best way to handle that sort of task is to adopt a standard data
interchange format and, based on that language, write a parser and an export routine to handle
the input and output. Andrew Poelstra already suggested the JSON language, which is a
terribly simple language to parse and also offers support for basic data structures. If you
see the human-readable aspect as being bloat then you also have the binary JSON (BSON)
proposal[¹]. There are also other similar efforts such as Tpl[2].

But no matter what language you chose, please do stay away from XML. Avoid it like the
plague.



Rui Maciel

[¹] http://bsonspec.org/
[2] http://tpl.sourceforge.net/
 
B

Brian

On 2010-04-01, gwowen <[email protected]> wrote:

    [...]
As soon as you start using fread for non-raw-data types, you run
into non-portabilities with endianess, primitive sizes, struture
alignment, trap representations, etc, etc.

As soon as you use fread for anything but char, you run into
such problems.  In practice, the only sensible use of fread is
for reading char (or unsigned char) buffers, which you unformat
manually.  


I use fread to read a time_t value that was written using fwrite
the last time the program ran. I think there could be a problem
if the program is rebuilt with different options, but that hasn't
been a problem so far. I could add a warning (in the makefile)
to erase the time stamp file if you tinker with the flags in the
makefile, but am not sure if that would be helpful.


Brian Wood
http://webEbenezer.net
(651) 251-9384
 
G

gwowen

As soon as you start using fread for non-raw-data types, you run
into non-portabilities with endianess, primitive sizes, struture
alignment, trap representations, etc, etc.

True, but Jacob has said that portable serialisation across
architectures (ABIs, I guess) is *NOT* one of his requirements.
 
A

Andrew Poelstra

True, but Jacob has said that portable serialisation across
architectures (ABIs, I guess) is *NOT* one of his requirements.

Fair enough, but structure alignments change across
/compiler flags/. This is quite a portability hit.
 
G

gwowen

Fair enough, but structure alignments change across
/compiler flags/. This is quite a portability hit.

If you're using compiler flags that change structure padding, you're
almost certainly violating the ABI, so good luck using any standard
library functions that use a structure in the prototype, or any 3rd
party libraries that aren't compiled with your favourite switches.
Compiler switches that cause ABI violations should be considered
harmful, and if you use them too indiscriminately, you'd better be
prepared to do *everything* by hand.
 
I

Ian Collins

If you're using compiler flags that change structure padding, you're
almost certainly violating the ABI, so good luck using any standard
library functions that use a structure in the prototype, or any 3rd
party libraries that aren't compiled with your favourite switches.
Compiler switches that cause ABI violations should be considered
harmful, and if you use them too indiscriminately, you'd better be
prepared to do *everything* by hand.

Um, how about -m64?
 
G

gwowen

It changes the alignment rules, so would you consider it harmful?

It also changes "long int" and pointer sizes, so yes, I'd think that'd
qualify as an ABI change (and a pretty hefty one at that). If you
omit and your system libraries don't, or vice versa, you're going to
be in a degree of bother -- alignment is the least of your worries,
any time you push a pointer onto the stack before a standard function
call you're liable to get bit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,225
Members
46,815
Latest member
treekmostly22

Latest Threads

Top