David RF said:
You're very kind
Despite binary file, here in ascii, I have a file named data.csv and
another named data.def
Ah, the code explained a lot! Neither of these files is a binary file
(you read single "type indicator" characters from one and use fscanf to
read and convert data from the other).
Your problem is not about packing or unpacking binary data but about
storing data whose type you don't know until run-time. What you are
doing below works fine (by the way, you are clearly a competent C
programmer) but you want to avoid wasting the space used by having a
union for each cell.
It is true that you might end up using some odd unpacking code but that
comes from the fact that you store the data in rows rather than columns.
I'd switch to having the columns store the data since the types are
associated with columns rather than rows. That way, all the data can be
properly aligned and the access will be simpler (and probably faster).
The trick for doing this is to use a union to get the alignment but then
lie when you allocate and access the column:
typedef union TData {
char as_chr;
int as_int;
float as_flt;
double as_dbl;
} Data;
enum {type_chr, type_int, type_flt, type_dbl};
typedef struct TCol {
struct TTable *table;
struct TCol *prev;
struct TCol *next;
int id;
int type;
Data data[]; /* Flexible array member */
} Col;
To allocate a column of type 'type' you use:
Col *col = malloc(sizeof *col + nrows * type_size[type]);
type_size is an array that maps type numbers to sizes:
size_t type_size[] = {
[type_chr] = sizeof(char), [type_int] = sizeof(int),
/* and so on. [x] = y is C99's designated initialiser *.
};
This does mean that you need to know how many rows there are and there
won't be an actual representation of a row. The first is easy: read
the file twice; once to count the rows and once to read the actual
data. The second may or may not be a problem -- it depends on how use
this structure.
To access this column data you need to lie about the data member:
Data get_cell(size_t row, Col *col)
{
switch (col->type) {
case type_char:
return (Data){ .as_char = ((char *)col->data)[row] };
case type_int:
return (Data){ .as_char = ((int *)col->data)[row] };
/* etc... */
}
}
(I am using C99's compound literals since you seem happy with using C99
features).
If you can't switch to storing columns, then just store and access the
element using memcpy as you are planning to do. You'll need to store
(in the Col structure) the byte offset of each column. The data member
can then just be a char (or unsigned char) array.
<snip explanatory code>