[ ... fully connected network for deallocation ]
The design exhibits great performance, except when presented with a user
program that persistently creates and destroys multiple threads. This can be
fairly significant downside wrt creating general purpose tools... ;^(
I wonder whether it isn't better to just avoid this problem. Rather than
always creating and destroying threads on demand, create a thread pool.
When the user wants a thread, they're really just allocating the use of
the thread from the pool (i.e. you give that thread the address where it
needs to start executing, and send it on its way). When they ask to
delete a thread, you just return it to the pool. If they ask for a
thread and the pool is empty, then you create a new thread. You could
then add a task for the thread pool to execute occasionally that trims
the thread pool if it stays too large for too long.
Obviously, this goes a bit beyond pure memory management, but not really
by a huge amount -- it's still definitely in the realm of resource
management.
[ ... ]
You have the 100% right idea overall.
I'm assuming you mean that since nearly all the data is constant on a
per-allocator basis, you just create a single information block per
allocator, then each allocated block just carries a pointer to the
information block in its associated allocator.
If you want to badly enough, you can reduce the per-block overhead even
more than that though -- instead of storing a pointer, store only an
index into a vector of information blocks. With some care, you should
even be able to eliminate that -- for example, have N bits of the
addresses produced by each allocator unique from the range used by any
other allocator. In this case, retrieving the allocator for a specific
block consists of a shift and mask of the block's address. Better still,
this is immune to the code using the block overwriting data by writing
to addresses outside the allocated space.
Obviously this latter method applies better to 64-bit addressing, but
with a bit of care could undoubtedly be applied for quite a few 32-bit
situations as well.