Offset/alignement of structure members

A

Arto Huusko

Hello,

I'm wondering about the portability and correctness of the
following scenario.

I have a generic doubly linked list structure, and it contains
generic nodes that look like this:

struct node
{
struct node *head;
struct node *tail;
};

struct list
{
struct node *first;
struct node *last;
};

Now, I want to use the generic list and node structures so that

- The list and its nodes stay completely generic.
That means I can have generic functions like

add_head(struct list *, struct node *);

Any list can be traversed and examined and manipulated
with the generic functions, as long as the operations
are stay on the generic level.

- The nodes can contain application specific data. That is:
the node structure is extended by adding application specific
fields, which can only be accessed if the application knows
what kind of node it is using.

Application specific node would look like

struct my_app_node
{
struct node node;
int datafield1;
char *datafield2;
etc.
};

Now, an application could do:

struct list mylist;
struct my_app_node node1;
struct my_app_node node2;

init_list(&mylist);

add_head(&mylist, (struct node *)&node1);
add_head(&mylist, (struct node *)&node2);

So, is this correct ISO C? Is this portable?

I know that I could do also:

add_head(&mylist, &node1.node);


But, of course, both questions boil down to: can I trust
that offset of "node" inside my_app_node is 0? And also,
does the compiler let me get away with the cast?


I also do know that a certainly correct solution would be:

struct node
{
struct node *next;
struct node *prev;
void *data;
};

struct my_app_node
{
struct node node;
/* data fields follow */
};

struct my_app_node mynode;
mynode.node.data = &mynode;

But in this scenario I'm completely vasting one pointer.

(This all motivated by GCC 3.3's strict alignment stuff...)
 
D

dis

Hello,

I'm wondering about the portability and correctness of the
following scenario.

I have a generic doubly linked list structure, and it contains
generic nodes that look like this:

struct node
struct node *head;
struct node *tail;
};

struct list
{
struct node *first;
struct node *last;
};

Now, I want to use the generic list and node structures so that

- The list and its nodes stay completely generic.
That means I can have generic functions like

add_head(struct list *, struct node *);

Any list can be traversed and examined and manipulated
with the generic functions, as long as the operations
are stay on the generic level.

- The nodes can contain application specific data. That is:
the node structure is extended by adding application specific
fields, which can only be accessed if the application knows
what kind of node it is using.

Application specific node would look like

struct my_app_node
{
struct node node;
int datafield1;
char *datafield2;
etc.
};

Now, an application could do:

struct list mylist;
struct my_app_node node1;
struct my_app_node node2;

init_list(&mylist);

add_head(&mylist, (struct node *)&node1);
add_head(&mylist, (struct node *)&node2);

So, is this correct ISO C? Is this portable?

Yes, this code fragment is conforming.
I know that I could do also:

add_head(&mylist, &node1.node);


But, of course, both questions boil down to: can I trust
that offset of "node" inside my_app_node is 0? And also,
does the compiler let me get away with the cast?

Yes, offsetof(struct my_app_node, node) is guaranteed to yield a value of 0.
The cast is necessary as a pointer to a structure object points to its
initial member after suitable conversion.

[snip]
 
C

CBFalconer

Arto said:
I'm wondering about the portability and correctness of the
following scenario.

I have a generic doubly linked list structure, and it contains
generic nodes that look like this:

struct node
{
struct node *head;
struct node *tail;
};

struct list
{
struct node *first;
struct node *last;
};

Now, I want to use the generic list and node structures so that

- The list and its nodes stay completely generic.
That means I can have generic functions like

add_head(struct list *, struct node *);

You have to do two basic things. Have the data accessible by code
the application provides, because ONLY the application knows the
type of that data. Have the lists manipulated by code that
preserves those generic pointers, which have to be of type void*.
Thus your basic node will look something like:

struct node {
struct node *next;
struct node *prev;
void* dataptr;
}

This will NOT be in the published header file, which will only
contain:

typedef struct node *node;

keeping the definition totally hidden from the application. Now
your call will be something like:

add_head(node thelist, void *datum);

where thelist was created by the call:

node thelist = createlist();

and you will probably also want

void destroylist(node thelist, freedata destroydata);

where destroydata is a pointer to a routine receiving void* and
releasing the data it points to. This may be required for other
operations, and you might supply it once and for all via
createlist.

Now createlist can malloc a struct node, initialize the pointers
to null, or to point back to the head if desired, with a NULL
datum, and you can henceforce use that dummy node to hold the
first and last pointers.

An example is worth a good deal of rambling. You can see exactly
this sort of generic operation in hashlib.zip, especially noting
the example usages in wdfreq and markov. It can be found at:

<http://cbfalconer.home.att.net/download/>
 
E

Eric Sosman

CBFalconer said:
You have to do two basic things. Have the data accessible by code
the application provides, because ONLY the application knows the
type of that data. Have the lists manipulated by code that
preserves those generic pointers, which have to be of type void*.
Thus your basic node will look something like:

struct node {
struct node *next;
struct node *prev;
void* dataptr;
}
[...]

This approach works, and has some advantages. One is that
a given data item can exist simultaneously in an arbitrary
number of lists.

However, it also has a disadvantage: The linkage information
is split away from the "payload," and this gets you into issues
of allocating storage for the two independently, worrying about
all those tiny allocations of pared-down `struct node' objects,
and so on. All these issues are manageable, but they do require
some management -- and if you don't need the full flexibility
of CBF's method, the O.P.'s technique of embedding the links
in the same struct as the payload can be more convenient.

Back to the O.P.'s original question: In a situation like

struct node {
struct node *next;
struct node *prev;
};

struct data {
struct node links;
int payload[42];
double trouble[2];
};

.... is the `links' member of `struct data' guaranteed to be
at offset zero within the struct? Yes, says Section 6.7.2.1
paragraph 13: "A pointer to a structure object, suitably
converted, points to its initial member [...] and vice versa.
There may be unnamed padding within a structure object, but not
at its beginning." So you can safely feed a pointer to the
`links' member into your linked-list package, get that same
pointer back later on, and convert it into a `struct data*'
without trouble:

struct list mylist;
struct data mydata;
struct data *dptr;
...
add (&mylist, &mydata.links);
...
dptr = (struct data*) get (&mylist);

In fact, you can do a little bit more without too much
trouble. Suppose you want each `struct data' to exist in
exactly two lists (or in any fixed number of lists). You can
still keep the linkage information with the payload:

struct data {
struct node links1;
struct node links2;
int payload[42];
double trouble[2];
};

The `links1' element is at offset zero, and can be handled as
before. The `links2' element is not at offset zero, but it is
at a constant offset, namely `offsetof(struct data, links2)'.
You can insert the struct into a secondary list like

struct list list2;
struct data mydata;
...
add (&list2, &mydata.links2);

Later on when you want to retrieve it you must work a little
harder (the following is spelled out for clarity; in practice
you'd probably abbreviate things a bit):

struct list *lptr;
struct data *dptr;
...
lptr = get (&list2);
dptr = (struct data *)
( (char*)lptr - offsetof(struct data, links2) );

That is, you retrieve a pointer to a `links2' element (because
that's the sort of pointer you put into the list in the first
place), and then you "back up" by a known distance to find the
start of the `struct data' that contains it. Of course, you've
got to be careful not to mix `links1' and `links2' pointers in
the same list; this technique requires that you know what offset
to use.

For a small, fixed number of lists -- one is the commonest
case, sparse matrix representations sometimes use a "row" list
and a "column" list -- this technique can be recommended. It's
not good for a variable number of lists (because it's hard to
keep track of the proper offsets) or for a large number of lists
(because the bookkeeping gets unwieldy and the chance of error
grows); for those circumstances I'd prefer CBF's technique.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top