Well I think the README should be rewritten...
I don't think so. Especially because I only read the first paragraph or so
In your example, I think you tried to use an AVL tree to keep track of
objects of type 'struct object'.
Yes.
I'll snip most of what you wrote because I did understand that before.
I'll keep the parts which I find problematic.
struct object {
/* ... */
int key;
struct avltree_node node;
};
Then later you need to search for the previously inserted object whose
key is 4:
struct object dummy = { .key = 4 };
struct avltree_node *the_node;
the_node = avltree_lookup(&dummy, tree);
if (!the_node) {
/* not found ! */
...
}
/* found */
As you noticed avltree_lookup() returns a pointer to 'struct
avltree_node' (not struct object) that is a pointer that points to the
member 'node' which belongs to the previously inserted object.
Not really convenient at this point, but the library provides an helper
that you can use to retrieve the object address:
struct object *the_obj;
the_obj = avltree_container_of(the_node, struct object , node);
assert(the_obj->key == 4);
Therein lies my problem:
#define avltree_container_of(node, type, member) ({ \
const struct avltree_node *__mptr = (node); \
(type *)( (char *)__mptr - offsetof(type,member) );})
First, as a side note (because this is not my main concern), you state in
the README, "[...] the library still provides an implementation which is
fully portable (C99) [...]". The above statement-expression is a GNU C
extension, not C99.
Second (my main concern), I suspect that this breaks strict aliasing
rules. That is, after the avltree_container_of() function-like macro is
expanded and the (statement-)expression is evaluated, you end up with two
pointers, namely "the_node" and "the_obj". They point to storage that
overlaps, and none of them was computed from the other by following
pointers and taking addresses of members.
That is, you have two pointers to different types, the pointed-to areas
overlap, and you obtained one of the pointers by type-punning (when you
cast the pointer-to-char to pointer-to-type). Consequently, I suspect the
compiler is allowed to assume that the objects below those pointers are
distinct, which is in reality not the case.
C99 6.7.2.1 p13 says, "[...] A pointer to a structure object, suitably
converted, points to its initial member [...], and vice versa. [...]".
Similar language can be found in C89 6.5.2.1.
Since "node" is not the initial member of "struct object", you have to
resort to offsetof(), instead of converting back the pointer explicitly,
but this way you can't rely on the guarantee of the cited paragraph.
Third, sometimes you wish to refer to an object from multiple trees.
Suppose you have a set of cars, and you want to order them based on
different attributes (organize them into multiple red-black trees) at the
same time. You want to store each "car" only once, but you want to search
(and traverse) them based on peak torque, peak performance, price, fuel
consumption, and so on. So you'd need (at least) four trees.
struct car {
struct rbtree_node nm_tree_node,
kw_tree_node,
bucks_tree_node,
mpg_tree_node; /* mileage, to appease our imperial readers */
double nm,
kw;
long unsigned bucks;
double mpg;
};
We immediately hit the problem of "kw_tree_node", "bucks_tree_node",
"mpg_tree_node" being non-initial members (see my suspicion of the
offsetof() problem). Ignoring that, the comparison functions do work,
because they each trace back from the node address to the enclosing
"struct car" and access the corresponding key field.
However, suppose we want to maintain a dynamic list of attributes for each
car, and to add each car to a respective dynamic set of trees.
struct attr {
const char *name;
enum {
AT_D,
AT_LU
} type;
union {
double d;
long unsigned lu;
} value;
};
struct car2 {
size_t num_attrs;
/* car2 is referred to by "num_attrs" trees */
struct rbtree_node *nodes_in_referring_trees;
/* car2 has "num_attrs" attributes */
struct attr *attrs;
};
This can work, but then the comparison function (or anything else) cannot
use the offsetof() trick *at all*, since the tree nodes for a car2 object
are allocated completely separately from the car2 object itself (ie. with
different malloc() calls).
To enable the comparison function (or anything else) to get back to the
object, you'd have to extend the node type with a pointer-to-void member.
(Or to embed the node type in a bigger type as its initial member -- this
was the idea in my earlier followup:
In summary, in my humble opinion,
(1) the offsetof() trick is not portable, with or without the
statement-expression,
(2) if you want to add an object to a dynamically determined set of trees,
you have to do away with the offsetof() trick completely and maintain
backreferences. Consequences:
(2a) The code becomes portable (additionally without the
statement-expression),
(2b) The code (the client code, the comparison function etc) becomes so
unwieldy that you'd be better off with "regular" nodes that are not
embedded into the client structs but hold a pointer to them.
Whenever objects are added only to a fixed set (or limited number) of
trees, and the platform ensures the offsetof() trick, like it probably is
the case with the Linux kernel and gcc, then the idea does save a malloc()
(= a node per object per referring tree).
If I'm wrong, I'd like to be corrected very much.
Thanks,
lacos