object serialization

W

William

I'm looking for an example that would show how to serialize a c++
object at it's simplest w/o using any other api's. I have a class that
I want to serialize and then pass to my obj-c class so I can send it
over the wire.

I'm just looking for how to serialize it, then pack it back up on the
other end.

Any help much appreciated.
 
R

Ron AF Greve

Hi,

This is how I do it:

Derive every class from an ISerialize class.
This ISerialize class has one member accepting an IArchive i.e Serialize(
IArchive& Archive ).
The ISerialize class has a virtual function that returns it name (overridden
in the derived classes to return the class name). The name is used to
deserialize the correct class.

The IArchive function knows whether it is open for reading or writing.
The IArchive class has two virtual members read( char *, long length ) and a
similar write().
The IArchive also contains non virtual functions for all regular types like
string long int (use templates etc to reduce these to a few functions).
In addition this IArchive can serialize/deserialize ISerialize derived
classes by first storing there name and then calling the class'es Serialize
function passing itself. On deserialization it first reads the class name
(it knows this is a ISerialize class and not another type like long int etc.
because you pass a pointer or reference to it.)
Then it creates the object (lookup 'object factory' this is pretty standard
C++ way of creating objects by name or id) and calls the serialize member of
the object passing itself.

The Serialize member of a ISerialize derived typically (piece of my code)

void MCursor::MCursorInfo::Serialize( MArchive& Archive )
{
Archive.Serialize ( Event );
Archive.Serialize ( OffsetX );
Archive.Serialize ( OffsetY );
Archive.Serialize ( Animation );
}


Note1 that since the archive knows whether it is reading or writing it can
choose between the read or write function.
Note2 Because you can overload the read and write function you can create
derived function to store to memory disks, sockets etc easily. You only have
to overload the very simplistic read and write function.
Note3 Because (de)/serialization is done with one function it always is in
sync (no possibility to make a mismatch between read and write).

When this works extend your archive to keep track of written pointers to
ISerailize objects (so it only creates one object when it there are several
pointers around to one object on deserialization).
and add some stuff to automatically save STL vectors/sets/maps etc of
ISerialize etc

It might take some time to setup but ones you have it, it really works like
a charm :) and loads of fun to see how easy it works then. I use it to save
my 2D game engines internal state (savegame) and load the same state back in
memory (deserialization).

Things to study:
Object factories
Lookup microsofts way of serialization on MSDN ( I stole the duplicate
pointer idea from them :) )
Make sure your are reasonably aquainted with templates


Regards, Ron AF Greve

http://www.InformationSuperHighway.eu
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

void MCursor::MCursorInfo::Serialize( MArchive& Archive )
{
Archive.Serialize ( Event );
Archive.Serialize ( OffsetX );
Archive.Serialize ( OffsetY );
Archive.Serialize ( Animation );

}

There is one extra complication that this approach doesn't address. As
the file format changes you need to be able to read old versions with
newer software.

There are a few ways of doing this. The way I've used in the past is
that each object also writes a schema number which can then be used to
work out which structure to use. You may get something that looks a
little more like this:

void MCursor::MCursorInfo::Serialize( MArchive& Archive )
{
int version = Archive.Version< MCursorInfo >( 2 );
if ( version >= 1 ) {
Archive.Serialize( Event );
Archive.Serialise( OffsetX );
Archive.Serialise( OffsetY );
}
if ( version >= 2 )
Archive.Serialise( Animation );
}

You want to arrange for Version to return the version that is to be
used. By passing in the type (I've done it via a template
specialisation, but there are other ways too) it can store a lookup
for overall file version against each schema part so that you can also
save to older file formats.


K
 
J

James Kanze

I'm looking for an example that would show how to serialize a c++
object at it's simplest w/o using any other api's. I have a class that
I want to serialize and then pass to my obj-c class so I can send it
over the wire.
I'm just looking for how to serialize it, then pack it back up on the
other end.

There are, regretfully, no simple answers. Basically, you'll
have to either define a line protocol yourself, or use an
existing one, then code every type you use to conform to the
line protocol.

If there are no other particular constraints, I'd start with XDR
for the low level types, and build on it. I'd probably define a
oxdrstream and an ixdrstream, with << and >> operators for the
primitive types. (Handling integers is easy. Floating point
less so; depending on how portable you want to code, it can even
be very complex.) More complex types then output each field.
(Some consideration must also be given to variable length types,
e.g. vectors and strings. XDR has some basic rules for these as
well.) And don't forget to follow pointers, if the pointed to
data is logically part of your object. (You cannot, of course,
serialize a pointer.)

You'll also have to give some thought to how the receiving end
will know what type it is getting. Depending on the protocol,
this may be more or less implicit, but most of the times, there
will be cases where you'll have to transmit this information as
well.
 
R

Roland Pibinger

If there are no other particular constraints, I'd start with XDR
for the low level types, and build on it. I'd probably define a
oxdrstream and an ixdrstream, with << and >> operators for the
primitive types.

Jack W. Reeves developed that kind of XDR library in a series of
articles (which may be found on the internet).
 
R

Roland Pibinger

I'm looking for an example that would show how to serialize a c++
object at it's simplest w/o using any other api's. I have a class that
I want to serialize and then pass to my obj-c class so I can send it
over the wire.

Read the C++ FAQ-Lite.
I'm just looking for how to serialize it, then pack it back up on the
other end.

Why not just write the data in some data format? Serialization seems
to be the wrong level of abstraction, not only in C++ but also in
other languages. Convert your data to a readable maybe even
standardized format and the receivers will be happy.
 
R

Ron AF Greve

Hi Kirit,

Thanks for the reply and noting I missed something important.

Actually I have a number assigned to the complete archive. However I don't
use it anywhere since I couldn't really decide if it was better to have a
version per object or just change the archive version when one or more
objects change and then use if( Archive.GetVersion() ) or something like
that.

However looking at your example I realize that it would indeed be better to
do it on a 'per object' basis.

Regards, Ron AF Greve

http://www.InformationSuperHighway.eu
 
W

William

Why not just write the data in some data format? Serialization seems
to be the wrong level of abstraction, not only in C++ but also in
other languages. Convert your data to a readable maybe even
standardized format and the receivers will be happy.

Well, I'm only doing it so I can pass the data from a process to a
thread I just launched. So it's all sorta in the same name space. It's
just my libraries are in C++ and I was hoping to serialize the data
nativly in c++ so that when I copy it into my obj-c objects, i've be
most of the way done w/ performance benefits.

I'll see if I can make this happen.
 
J

James Kanze

Well, I'm only doing it so I can pass the data from a process to a
thread I just launched.

If the two threads are in the same process, you don't need
serialization, since they share common memory. (You will need
to ensure synchronized access, however.)
So it's all sorta in the same name space. It's
just my libraries are in C++ and I was hoping to serialize the data
nativly in c++ so that when I copy it into my obj-c objects, i've be
most of the way done w/ performance benefits.

In fact, you're concerned with communicating between two
languages, rather than between two processes or machines.
That's a different kettle of fish entirely; depending on the
representations used, it can vary from trivial (from C++ to C)
to very complicated (C++ to Cobol, perhaps). Serialization is
one solution, of course, but it is rarely the simplest or the
most efficient. Basically, when going from language A to
language B:

-- If possible, use data with compatible formats. This is what
makes C++ to C work so well; C++ more or less requires a
large category of data types to have a format compatible
with C, so all you have to do is pass a pointer to it to the
C function.

-- Failing that, you'll have to convert the data somehow. The
conversions can be more or less complicated: when going from
C++ to Fortran, for example, most of the basic types (int,
float, etc.) will be compatible, so all you have to worry
about is the fact that all Fortran parameters are by
reference, that Fortran arrays are row major, rather
than column major (but that can possibly be handled just by
declaring them differently) and strings---Fortran's
character type is not generally compatible with either
std::string or char[]. If the target is Cobol, on the other
hand, you may end up having to convert double's to BCD, and
what have you.

-- Finally, of course, if you already have serialization
routings available in both languages, you can use them. Be
aware, however, that what this really means is converting
C++ date to a neutral, third format, and then converting
this format back to the format in the other language. And
that the neutral, third format conforms to a number of
constraints which generally make it less efficient to
convert to and from than other formats.

If the serialization is already present and handy in both
languages, or if it is or will be necessary anyway,
serialization is certainly an option to be considered.
Otherwise, however, it is by far the least efficient, both where
it always counts (your time and effort), and in terms of
performance.
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Hi Kirit,

Thanks for the reply and noting I missed something important.

Actually I have a number assigned to the complete archive. However I don't
use it anywhere since I couldn't really decide if it was better to have a
version per object or just change the archive version when one or more
objects change and then use if( Archive.GetVersion() ) or something like
that.

However looking at your example I realize that it would indeed be better to
do it on a 'per object' basis.

Top posting is generally frowned upon here.

There is a trade-off with the version per object rather than version
per file. Clearly it is going to make the archive data larger, but
even in '95 when I wrote about serialisation in DDJ <http://
www.ddj.com/184409645?pgno=3> the overhead was worth it. These object
numbers are especially important when you're in development because
the object layouts are going to change rather frequently.

There is a way around this. You can arrange for the archive to be
versioned and for final release builds you also arrange for
Archive.Version<>() to return the archive version number which you
make sure is a higher number than the version number you'd got up to
for any object. For the development round you then start jump above
this number for each object again.

Archives saved with development builds will therefore be bigger than
those saved with release builds, but they will be readable by the
software version that saved them or any later version (there are some
complications about object classes that are no longer in use, but the
details aren't too hard to work out).

As for the DDJ article, the code isn't really worth anything now. It
was written when template programming meant using C macros - ouch! The
ideas should still be relevant though.


K
 
R

Ron AF Greve

Hi Kirit,

Kirit Sælensminde said:
There is a trade-off with the version per object rather than version
per file. Clearly it is going to make the archive data larger, but
even in '95 when I wrote about serialisation in DDJ <http://
www.ddj.com/184409645?pgno=3> the overhead was worth it. These object
numbers are especially important when you're in development because
the object layouts are going to change rather frequently.

Thanks for the article it is good to see different approaches. Though in my
case I like the superclass idea (I have a common superclass anyway since I
need it in the rest of the engine). As a note to the OP (in the superclass
approach anyway) pointers have to be tested for zero and if so on
saving/loading a zero is saved or loaded but (obviously) the, virtual,
Serialize member on the zero pointer shouldn't be called.
There is a way around this. You can arrange for the archive to be
versioned and for final release builds you also arrange for
Archive.Version<>() to return the archive version number which you
make sure is a higher number than the version number you'd got up to
for any object. For the development round you then start jump above
this number for each object again.

Archives saved with development builds will therefore be bigger than
those saved with release builds, but they will be readable by the
software version that saved them or any later version (there are some
complications about object classes that are no longer in use, but the
details aren't too hard to work out).
Ok, good idea, that would be best of both worlds..
As for the DDJ article, the code isn't really worth anything now. It
was written when template programming meant using C macros - ouch! The
ideas should still be relevant though.


K


Regards, Ron AF Greve

http://www.InformationSuperHighway.eu
 
W

William

If the two threads are in the same process, you don't need
serialization, since they share common memory. (You will need
to ensure synchronized access, however.)

That's one possibility, but since the user may modify these objects
while my thread is processing the copies, it really makes since to do
it this way.
In fact, you're concerned with communicating between two
languages, rather than between two processes or machines.
That's a different kettle of fish entirely; depending on the
representations used, it can vary from trivial (from C++ to C)
to very complicated (C++ to Cobol, perhaps). Serialization is
one solution, of course, but it is rarely the simplest or the
most efficient. Basically, when going from language A to
language B:

-- If possible, use data with compatible formats. This is what
makes C++ to C work so well; C++ more or less requires a
large category of data types to have a format compatible
with C, so all you have to do is pass a pointer to it to the
C function.

I'm going from C++ to obj-c, and the pointer would work except for the
reasons I state above.
-- Failing that, you'll have to convert the data somehow. The
conversions can be more or less complicated: when going from
C++ to Fortran, for example, most of the basic types (int,
float, etc.) will be compatible, so all you have to worry
about is the fact that all Fortran parameters are by
reference, that Fortran arrays are row major, rather
than column major (but that can possibly be handled just by
declaring them differently) and strings---Fortran's
character type is not generally compatible with either
std::string or char[]. If the target is Cobol, on the other
hand, you may end up having to convert double's to BCD, and
what have you.

Conversions should be easy from C++ to obj-c.
-- Finally, of course, if you already have serialization
routings available in both languages, you can use them. Be
aware, however, that what this really means is converting
C++ date to a neutral, third format, and then converting
this format back to the format in the other language. And
that the neutral, third format conforms to a number of
constraints which generally make it less efficient to
convert to and from than other formats.

If the serialization is already present and handy in both
languages, or if it is or will be necessary anyway,
serialization is certainly an option to be considered.
Otherwise, however, it is by far the least efficient, both where
it always counts (your time and effort), and in terms of
performance.

Understood.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,298
Messages
2,571,540
Members
48,274
Latest member
HowardKipp

Latest Threads

Top