Serialization in C

Debajyoti Sarma · Jul 14, 2010

How to do serialization in C ? Question arises from serialization of
binary tree.

Malcolm McLean · Jul 14, 2010

How to do serialization in C ? Question arises from serialization of
binary tree.

Very hard to do well. Really serialization is so fundamental it should
be built into the language.

A tree can be serialised as a set of nested parentheses, if the
requirement is to save as text. Look up Newick tree format.

However there's no easy was to write a generalised serialise function
for a general tree containing arbitrary data.

bart.c · Jul 14, 2010

Malcolm said:
Very hard to do well. Really serialization is so fundamental it should
be built into the language.

So fundamental that I don't think I've ever had to do it. I'm not even sure
what it means (flattening?).

A tree can be serialised as a set of nested parentheses, if the
requirement is to save as text. Look up Newick tree format.

However there's no easy was to write a generalised serialise function
for a general tree containing arbitrary data.

Sounds like building it in (to C) would be tricky then. Especially when the
language doesn't yet have trees built-in.

Debajyoti Sarma · Jul 14, 2010

ok.
Can't we store number of structure variables in file?
I thing that will do .... by putting 2 separate tree traversal in the
file....we can reconstruct the tree.

Vincenzo Mercuri · Jul 14, 2010

Debajyoti Sarma ha scritto:

How to do serialization in C ? Question arises from serialization of
binary tree.

Hi,
a quick search on google gave me 3 main results:

http://tpl.sourceforge.net/

http://www.enlightenment.org/p.php?p=about/efl/eet&l=en

http://s11n.net/c11n/

but I can't say how they work or if they fully
support storage of nested structures.

Ben Bacarisse · Jul 14, 2010

Vincenzo Mercuri said:
Debajyoti Sarma ha scritto:

a quick search on google gave me 3 main results:

http://tpl.sourceforge.net/

but I can't say how they work or if they fully
support storage of nested structures.

Depends on what you mean by "nested". TPL does not handle any data
organisation that uses pointers so linked lists and tree have to be done
by hand. In effect, TPL will do the leaves, but that's about it.

Malcolm McLean · Jul 14, 2010

So fundamental that I don't think I've ever had to do it. I'm not even sure
what it means (flattening?).

It means a data structure has to be stored as an ordered sequence of
bytes. So "flattening" is a perfectly good term.

Some people never need to do this, beyond writing out structs to a
file. However if you do need it, it is absolutely crucial.

Some examples:
a binary file format requires that floating point numbers be embedded
in IEEE format, whilst the floating point unit on the host processor
isn't IEEE.
a parallel system requires structures to be passed between processors
in the form of messages.
a tree representing evolutionary relationships neeeds to be saved to
disk.

Vincenzo Mercuri · Jul 14, 2010

Ben Bacarisse ha scritto:

Depends on what you mean by "nested". TPL does not handle any data
organisation that uses pointers so linked lists and tree have to be done
by hand. In effect, TPL will do the leaves, but that's about it.

Yes with "nested" I meant nodes with sub-nodes, etc..
I've only taken a hasty look at the homepages,
I can't suggest them actually since I don't know them

Eric Sosman · Jul 14, 2010

ok.
Can't we store number of structure variables in file?
I thing that will do .... by putting 2 separate tree traversal in the
file....we can reconstruct the tree.

You can certainly write data to files, and read it back again.
It is up to you to choose a file format suited to your needs. You
mentioned (in some context that's been snipped away) that you're
interested in writing a binary tree to a file, which means you need
to find a way to embed a non-linear data structure in the file's
linear sequence. There are lots of ways to do this; see TAOCP 2.3.3
"Other Representations of Trees" for some ideas.

C will allow you to write a pointer to a file and read it back
again, but that's not useful unless you can guarantee that the thing
it points to is at the same memory location as in the original -- a
very difficult thing to arrange if you wrote the file in one program
and are trying to read it back in another, or even in another execution
of the same program. So if you choose one of the tree representations
that involves links, you'll probably want to replace the pointers by
something else for storage in the file, and then reconstitute actual
pointers from the "something else's" when you read it back again.
One way (there are many) would be to give each node an ID number,
possibly related to its position in the file: 1,2,3,... for the
first, second, third,... nodes, say. Then instead of writing the
pointer, you'd write the ID number of the pointed-to node, or 0 for
a NULL. When reading, you'd translate the ID numbers back to
pointers again for in-memory storage.

Willem · Jul 14, 2010

Eric Sosman wrote:
) C will allow you to write a pointer to a file and read it back
) again, but that's not useful unless you can guarantee that the thing
) it points to is at the same memory location as in the original -- a
) very difficult thing to arrange if you wrote the file in one program
) and are trying to read it back in another, or even in another execution
) of the same program. So if you choose one of the tree representations
) that involves links, you'll probably want to replace the pointers by
) something else for storage in the file, and then reconstitute actual
) pointers from the "something else's" when you read it back again.
) One way (there are many) would be to give each node an ID number,
) possibly related to its position in the file: 1,2,3,... for the
) first, second, third,... nodes, say. Then instead of writing the
) pointer, you'd write the ID number of the pointed-to node, or 0 for
) a NULL. When reading, you'd translate the ID numbers back to
) pointers again for in-memory storage.

Another trick is to allocate one large block of memory for the whole tree,
and use indexes instead of pointers for the links.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Ersek, Laszlo · Jul 14, 2010

However there's no easy was to write a generalised serialise function
for a general tree containing arbitrary data.

A coarse idea (not easy, indeed):

1) I'll suppose the original structure types all have an initial member
that leads to a static const struct of function pointers. That is, I'll
assume all structure types already support "do something with me"
operations that can be called back from a tree traversal.

struct base_ops;

struct base
{
/* whatever */
const struct base_ops *ops;
};

struct type1
{
struct base base;
/* whatever */
};

struct type2
{
struct base base;
/* whatever */
};

static struct base_ops base_ops
{
int (*print_nicely)(FILE *stream, const struct base *base);
};

Suppose we have a tree node like this:

struct rbt_node
{
void *data;
struct rbt_node *parent, *left, *right;
enum rbt_color color;
};

I'll assume that we already have mechanisms in place that allow us to
traverse a tree and print each linked-to object nicely.

2) Extend the base_ops struct with

int (*serialize)(FILE *stream, const struct base *base);

The generic walker function might be declared like this:

int
walk(const struct rbt_node *root,
int (*visitor)(const void *node_data, void *workspace));

and

int
my_visitor(const void *node_data, void *workspace)
{
const struct base *base;
FILE *f;

base = node_data;
f = workspace;

return (*base->ops->serialize)(f, base);
}

3) The "virtual" serialize() function should dump each base-derived
structure. The output should probably start with a fixed type identifier
(so that factory (deserialization) functions can work later), then the
members. A single serialize() function could invoke further serialization
functions.

4) Each single object should be dumped exactly once. Members that are
allocated within a structure ("composition") should be dumped in-place,
members that are linked-to (separate malloc(), with ownership
responsibility, "aggregation") should be recursively followed.
"Navigation" pointers shouldn't be followed at all, but they have to be
serialized nonetheless. Function pointers should be serialized as
"function identifier strings".

5) The serialized form of each object should contain the unique identifier
of that object. Navigation pointers should be serialized as the unique
identifiers of the linked-to objects. On most platforms, converting the
pointers and addresses to (void *) and printing them with %p should work.

6) Deserialization would happen in two phases. The first phase would
consist of a loop:

- read type identifier token, advance stream
- call type-specific factory function that parses the object and advances
the stream.

Constructed objects are added to (linked to by) a random-access
dictionary, addressible by the serialized unique object identifier (eg.
the originally valid %p strings).

Once all objects are deserialized, the second phase would occur (on top of
the now complete dictionary), revisiting each object exactly once,
updating all "navigation links" (pointer members without ownership
responsibility) via dictionary lookups.

Finally, the dictionary is thrown away.

Sorry if this is a mess...

lacos

Ersek, Laszlo · Jul 14, 2010

static struct base_ops base_ops
{
int (*print_nicely)(FILE *stream, const struct base *base);
};

Ahem, this is not C, but you know what I meant.

struct base_ops
{
int (*print_nicely)(FILE *stream, const struct base *base);
};

static const struct base_ops type1_ops = { &type1_print_nicely },
type2_ops = { &type2_print_nicely };

Sorry for the short circuit, I posted in a rush.

lacos

Paul N · Jul 14, 2010

How to do serialization in C ? Question arises from serialization of
binary tree.

The C++ FAQ at http://www.parashift.com/c++-faq-lite/ has a whole
chapter devoted to serialization. Obviously some of the information
won't be directly relevant, but it's probably a good start. And many
of C++'s features can be simulated in C without too much difficulty.

Gene · Jul 15, 2010

How to do serialization in C ? Question arises from serialization of
binary tree.

C is a canvas where you can paint any of the methods that a larger
language may have built in. Below is how you might roughly emulate
Ada.Serial_IO, for example (lots of error checking omitted). Whether
you choose XML, lisp S-expressions, line-oriented, or whatever
representation, the principle is the same.

To get a general purpose facility, it's probably simplest to include a
small integer type tag in each data structure. The tags index a table
of vtbl pointers, and the vtbls have the custom serializer and
deserializer for each type. The various schemes you see to embed vtbl
pointers in the structs themselves at a common location all seem to
have portability issues (not?).

#include <stdio.h>
#include <stdlib.h>

typedef int DATA;

typedef struct node_s {
struct node_s *lft, *rgt;
DATA val;
} NODE;

#define NULL_TAG 0
#define DATA_TAG 1

void serialize(FILE *f, NODE *tree)
{
static char null_tag[1] = { NULL_TAG };
static char data_tag[1] = { DATA_TAG };
if (tree) {
fwrite(data_tag, 1, 1, f);
fwrite(&tree->val, sizeof tree->val, 1, f);
serialize(f, tree->lft);
serialize(f, tree->rgt);
}
else
fwrite(null_tag, 1, 1, f);
}

NODE *deserialize(FILE *f)
{
char tag;
NODE *tree;

fread(&tag, 1, 1, f);
switch (tag) {
case NULL_TAG:
return 0;
case DATA_TAG:
tree = malloc(sizeof *tree);
fread(&tree->val, sizeof tree->val, 1, f);
tree->lft = deserialize(f);
tree->rgt = deserialize(f);
return tree;
}
}

void print(NODE *tree)
{
if (tree) {
printf("(");
print(tree->lft);
printf(" [%d] ", tree->val);
print(tree->rgt);
printf(")");
}
else
printf("-");
}

NODE *new(DATA val, NODE *lft, NODE *rgt)
{
NODE *r = malloc(sizeof *r);
r->val = val;
r->lft = lft;
r->rgt = rgt;
return r;
}

int main(void)
{
FILE *f;
NODE *tree =
new(1,
new(2,
new(3,
new(4,
new(5, 0, 0),
new(6, 0, 0)),
new(7,
new(8, 0, 0),
0)),
0),
new(9,
new(10, 0, 0),
0));

printf("before: ");
print(tree);
printf("\n");

f = fopen("tree.ser", "wb");
serialize(f, tree);
fclose(f);

f = fopen("tree.ser", "rb");
tree = deserialize(f);
fclose(f);

printf("after: ");
print(tree);
printf("\n");

return 0;
}

$ ./a.out
before: (((((- [5] -) [4] (- [6] -)) [3] ((- [8] -) [7] -)) [2] -) [1]
((- [10] -) [9] -))
after: (((((- [5] -) [4] (- [6] -)) [3] ((- [8] -) [7] -)) [2] -) [1]
((- [10] -) [9] -))
$

Cheers.

Gene · Jul 15, 2010

To get a general purpose facility, it's probably simplest to include a
small integer type tag in each data structure. The tags index a table
of vtbl pointers, and the vtbls have the custom serializer and
deserializer for each type. The various schemes you see to embed vtbl
pointers in the structs themselves at a common location all seem to
have portability issues (not?).

Sorry. Of course type tags have the same issues as vtbl pointers. Is
there a portable way to reference common content in different struct
types, or is a gi-normous union (with its inherent dead memory and
compilation issues) the only way?

Marc · Jul 15, 2010

Après mûre réflexion, Debajyoti Sarma a écrit :

How to do serialization in C ? Question arises from serialization of
binary tree.

Many ways, but rather boring when you have to manage a large number of
types. I often used for this purpose the XDR standard for external data
encoding because a tool named rpcgen automates the generation of the
serialization/deserialization code from the definition of types in a
C-like language (the C types definition are also generated
automatically from this definition).

XDR support complex data types linked via pointers provided that there
is no loop. Moreover, it is easy in many cases to support reference
loops by using intermediate integer indexes.
XDR is a very stable standard that emerged about 20 years ago. For
example, common protocols like NFS and NIS/NIS+ are based on it ...

Marc

Ersek, Laszlo · Jul 15, 2010

Sorry. Of course type tags have the same issues as vtbl pointers. Is
there a portable way to reference common content in different struct
types, or is a gi-normous union (with its inherent dead memory and
compilation issues) the only way?

C99 6.7.2.1 Structure and union specifiers, p13

----v----
[...] A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-field, then to the unit in
which it resides), and vice versa. [...]
----^----

About the same is written under C89 6.5.2.1.

lacos

Gene · Jul 15, 2010

Sorry. Of course type tags have the same issues as vtbl pointers. Is
there a portable way to reference common content in different struct
types, or is a gi-normous union (with its inherent dead memory and
compilation issues) the only way?

Click to expand...

C99 6.7.2.1 Structure and union specifiers, p13

----v----
[...] A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-field, then to the unit in
which it resides), and vice versa. [...]
----^----

About the same is written under C89 6.5.2.1.

lacos- Hide quoted text -

- Show quoted text -

The problem is the "suitably converted" business. If you don't know
the exact type of an object (which is after all the reason for wanting
to read the type tag or vptr), there is no way to convert the pointer
to that type! If we have:

typedef struct base_s {
int tag;
} BASE:

typedef struct point_s {
int tag;
double x, y, z,
} POINT;

typedef struct name_s {
int tag;
char name[10];
} NAME;

We would like to be able to pass round BASE* p; and say

if (p->tag == POINT_TAG) ... handle a point ...
else if (p->tag == NAME) ... handle a name.

as kind of hand-rolled polymorphism.

But this method is non-portable because we'd have to say ((POINT*)p)-

tag to be sure we're getting the tag of a point, but that's

nonsense.

How do get this effect?

Gene

Ersek, Laszlo · Jul 15, 2010

typedef struct base_s {
int tag;
} BASE:

typedef struct point_s {
int tag;
double x, y, z,
} POINT;

typedef struct name_s {
int tag;
char name[10];
} NAME;

We would like to be able to pass round BASE* p; and say

if (p->tag == POINT_TAG) ... handle a point ...
else if (p->tag == NAME) ... handle a name.

typedef struct base_s {
int tag;
} BASE;

typedef struct point_s {
BASE b;
double x, y, z,
} POINT;

typedef struct name_s {
BASE b;
char name[10];
} NAME;

if (POINT_TAG == p->tag) handle_point((POINT *)p);

Or, in your original example, forget BASE completely, and pass around
pointers-to-int ((int*)).

if (POINT_TAG == *p) handle_point((POINT *)p);

lacos

Gene · Jul 15, 2010

typedef struct base_s {
int tag;
} BASE:

Click to expand...

typedef struct point_s {
int tag;
double x, y, z,
} POINT;

Click to expand...

typedef struct name_s {
int tag;
char name[10];
} NAME;

Click to expand...

We would like to be able to pass round BASE* p; and say

Click to expand...

if (p->tag == POINT_TAG) ... handle a point ...
else if (p->tag == NAME) ... handle a name.

Click to expand...

typedef struct base_s {
int tag;

} BASE;

typedef struct point_s {
BASE b;
double x, y, z,

} POINT;

typedef struct name_s {
BASE b;
char name[10];

} NAME;

if (POINT_TAG == p->tag) handle_point((POINT *)p);

Or, in your original example, forget BASE completely, and pass around
pointers-to-int ((int*)).

if (POINT_TAG == *p) handle_point((POINT *)p);

lacos- Hide quoted text -

- Show quoted text -

Alas, no. Other articles in this group indicate, and they appear
correct, that this is not compliant because the p in p->tag is not
"suitably converted." If p does not point to a BASE (in my example),
p->tag has no meaning under the Standard. I've seen an example
similar to my BASE / POINT where a certain version of GCC actually
demonstrated bad behavior because a record containing a tag and a
double was aligned differently that one containing only a tag. I.e. in
this environment, casts between struct types could generate pointer
adjustment code. I recall it was a cross-compiler for an embedded
processor, so this is probably not a common occurrence. I have never
seen it in e.g. an x86 setting.

Different Serialization Technique In .NET	0	Sep 27, 2013
Serialization library, request for feedback	42	Dec 13, 2012
Encoding of primitives for binary serialization	17	Apr 9, 2009
Meme generator in c	1	Dec 23, 2022
Serialization of Graph-like C++ Structures	2	Jul 9, 2009
Which data serialization format?	9	Aug 8, 2011
Serialization in C++	2	Apr 9, 2010
serialization	8	Nov 5, 2009

Serialization in C

Debajyoti Sarma

Malcolm McLean

bart.c

Debajyoti Sarma

Vincenzo Mercuri

Ben Bacarisse

Malcolm McLean

Vincenzo Mercuri

Eric Sosman

Willem

Ersek, Laszlo

Ersek, Laszlo

Paul N

Gene

Gene

Marc

Ersek, Laszlo

Gene

Ersek, Laszlo

Gene

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads