TDB said:
Hello,
I'm creating an application using C which requires the data structures
like trees and graphs to be stored in files and retrieved later
( simply serialization of a data structure ) .
Is there any libraries available in C for this purpose ?
well, there are a lot of libs for a lot of things, so a lot depends on what
you want to do.
if XML is acceptable, than maybe either SAX or DOM may be what you are
looking for.
libxml is a common choice here.
many projects may implement other representations as well, for example,
Lisp-style S-Expressions may well be a good starting point for a customized
(yet still "general") data processing/serialization system (this depends a
lot on what is being done, but anymore, I will generally lean more in favor
of XML than S-Expressions).
also common is implementing a format based on line-oriented text files.
in this way, we can use fprintf, fgets, and sscanf (actually, I more usually
implement a 'split' function here), in order to serialize and reload the
data.
this approach is often both simple and allows fast loading/saving, but is
less general than SAX or DOM (essentially, the structure of the data and the
structure of the file are tied together).
as such, this is good for dumping and reloading a bunch of context-specific
data (such as tables, 3D geometry, ...) but is not so good for more
free-form data (such as compiler AST trees, hypertext documents, ...).
for XML, this is fairly close to the SAX approach.
another common approach, is to regard the file as a serialized bytestream or
similar.
in these cases, we often have loaders that, for example, use fgetc to read
individual bytes (or sometimes, FOURCC's and similar), and dispatch to the
appropriate handlers as needed (read the magic value for a node, so call the
function to handle reading nodes, ...).
this approach tends to lead to formats that are both fairly dense, and have
a good tradeoff between speed and flexibility (textual formats tend to be
much bigger and slower, and data-dumping formats tend to be brittle).
however, as a cost, these formats are not human-readable (unlike most
textual formats), and are typically much slower than raw dumping.
I suspect, however, for widely used binary formats, variations of this
approach are the most common.
another common approach is data dumping:
freading/fwriting data is possible, but as noted, may require a lot of
processing to serialize and reload the data. for this approach, it is common
practice to avoid any use of pointers within these structures (instead,
indices and offsets are used almost exclusively, and the app will operate on
the data more or less as it appears in the files).
for example, I have seen loaders that will fread a whole file structured
like this, and then create a context with:
a pointer to the file's data;
a set of specific pointers to specific parts of the files' data (such as the
strings tables, nodes table, ...).
other times, specific tables are read, and placed within different buffers.
when handling accesses, we will use indices or offsets, to access the
specific members.
j=ctx->node
.left;
k=ctx->node.right;
s=ctx->strtab+ctx->node.name;
other times, these files may be split up into some number of chunks or
blocks, which are loaded, processed, and/or stored, as needed. this
variation is common within things like database engines (for example, for
implementing structures like B-Trees).
the advantage of this approach is that file loading and saving can be made
to operate very quickly (for a save, we simply dump the whole file's
contents back to disk, or, in some cases, unmap the file from memory).
the disadvantage, is that these kind of file formats can become excessively
brittle (these are the kinds of files where you often find version numbers,
endianess and alignment check values, ..., embedded in the headers).
as a result, these formats are usually used within a specific version of a
specific app, and very rarely become common-use. very often, these formats
are also not formally specified either (when the code or data changes, so
does the file format...).
dunno if this helps any...