J
jacob navia
Continuing the container library I have started the more complex
containers: trees.
I implemented over the holidays the AVL trees in a small module of
around 600 lines. Obviously the bare minimum: insert/delete and
find. There is an equal method to compare two trees.
This is a powerful container that can search a giga-object database with
just around 30 comparisons...
But it is still not the red/black trees used by the C++ crowd, that will
be done later.
This trees implement sets, i.e. all elements are unique, and there are
no keys, the element itself is the key and it is stored just behind
each node.
What bothers me is the enormous overhead of the 64 bit pointers:
typedef struct tagBinarySearchTreeNode {
char hidden;
char factor;
struct tagBinarySearchTreeNode *left;
struct tagBinarySearchTreeNode *right;
char data[];
} BinarySearchTreeNode;
Size: 24 bytes overhead
To store the "hidden" bit and the 3 bits needed for the factor
I am wasting 8 bytes since the pointers must be aligned at an
8 byte boundary... Besides I am wasting 16 bytes in those two
pointers when there will be very few applications that will use
trees of more than 4Giga objects... and in fact most applications
will fit in 65535 objects.
If we limit ourselves to 65535 objects in the tree, we could
allocate a table with 65535 nodes, and instead of using 8 byte pointers
we could use 16 bit unsigned shorts that would index into that
table. At the same time, we eliminate the alignment bytes.
We would have then:
typedef st+ruct tagBinarySearchTreeNode {
char hidden;
char factor;
unsigned short left;
unsigned short right;
char data[];
} BinarySearchTreeNode;
Size: 6 bytes overhead. An improvement of 300% !!
Limiting the number of objects to 4 Giga objects yields
still an improvement of 100% since our node would take
12 bytes.
It is in this flexibility to elimnate waste where C could
make a big contribution precisely.
containers: trees.
I implemented over the holidays the AVL trees in a small module of
around 600 lines. Obviously the bare minimum: insert/delete and
find. There is an equal method to compare two trees.
This is a powerful container that can search a giga-object database with
just around 30 comparisons...
But it is still not the red/black trees used by the C++ crowd, that will
be done later.
This trees implement sets, i.e. all elements are unique, and there are
no keys, the element itself is the key and it is stored just behind
each node.
What bothers me is the enormous overhead of the 64 bit pointers:
typedef struct tagBinarySearchTreeNode {
char hidden;
char factor;
struct tagBinarySearchTreeNode *left;
struct tagBinarySearchTreeNode *right;
char data[];
} BinarySearchTreeNode;
Size: 24 bytes overhead
To store the "hidden" bit and the 3 bits needed for the factor
I am wasting 8 bytes since the pointers must be aligned at an
8 byte boundary... Besides I am wasting 16 bytes in those two
pointers when there will be very few applications that will use
trees of more than 4Giga objects... and in fact most applications
will fit in 65535 objects.
If we limit ourselves to 65535 objects in the tree, we could
allocate a table with 65535 nodes, and instead of using 8 byte pointers
we could use 16 bit unsigned shorts that would index into that
table. At the same time, we eliminate the alignment bytes.
We would have then:
typedef st+ruct tagBinarySearchTreeNode {
char hidden;
char factor;
unsigned short left;
unsigned short right;
char data[];
} BinarySearchTreeNode;
Size: 6 bytes overhead. An improvement of 300% !!
Limiting the number of objects to 4 Giga objects yields
still an improvement of 100% since our node would take
12 bytes.
It is in this flexibility to elimnate waste where C could
make a big contribution precisely.