C:\MinGW\bin\g++.exe -std=c++98 -pedantic-errors -Wall
-fexpensive-optimizations -O3 -ffloat-store -mcpu=pentiumpro temp.cpp -o
temp.exe
-mcpu is deprecated on g++ 3.4.2, omitting that, the first four:
54.5 ms
2.7 ms
29.5 ms
2.3 ms
The compiler options cannot do very much to catch up the difference, because
I *cheated*, normally if you want to allocate internal nodes for the linked
list you would use the allocator but the core::list has internal block
allocator which stores the allocated blocks in this format:
struct foo
{
foo* next; // link to next delete[] call when we want to get rid of the
memory
bar* data; // information for delete[]
};
This contributes to reduced memory footprint, I know this is platform
dependent but let's look at how things work with Visual C++ .NET 2003 on
Windows, when we allocate memory using malloc or new (assuming not
overloaded) the block size for a single memory allocation is 8 bytes. This
is the minimum amount of memory that can be allocated. Overhead per
allocated block is 24 bytes. Our iterator node fits easily within 8 bytes
since it only stores xor of the next and prev pointers, actually both next
and prev in this platform would fit into single allocation block, totaling
32 bytes consumed per node.
Since we allocate in lots of atleast 16 blocks, this means the 24 bytes is
ditributed evenly over each node so the memory footprint is the mandatory 4
bytes for xor, and 1.5 bytes for the memory allocation overhead so we spend
5.5 bytes of memory per node, this is roughly 6-to-1 saving in memory
footpring for the linked list management. Additionally, the nodes are
initially placed in memory in sequential locations which is very cache
friendly (it is also very cache friendly that the iteration needs only 1/6
of unique memory addreses to go through when iterating!)
In this scenario saving of merely FOUR BYTES (one pointer) means 50% saving
in memory footpring! 5.5 vs. 9.5 bytes per node! If we did not have the
built-in block allocation of nodes the impact of having two separate, unique
pointers would be neglible. Now that we know this, we also realize that
having a custom allocator would be only HALF AS EFFICIENT as one that we
have integrated into the list class itself (compare the memory footprints
for both cases, xor and next/prev strategy).
I just got this idea one day and figured it worth trying after a little
consideration. Making improvements to vector is not very easy, since my code
seems to be consistently to be 1% or so slower (I am using specialization to
detect if type is POD, et cetera to give little extra boost, don't do
placement new or such kept it very simple). I also notice that the std::map
implementation is very good, atleast as good as RB-tree, clearly not AVL
tree (if it is on std::map implementations all more the power to the
implementations as I generally found RB tree to be better than AVL tree but
I haven't really looked into that very much just compared performances a
little bit with very basic tree implementations). p.s. I am fully aware that
std::map doesn't imply tree but that is the implementation that would come
to my mind first when reading the contract the interfaces makes with the
client regarding performance, I could be totally wrong ofcourse so just
thinking "aloud" the thoughts are not very coherent anymore excuse me i
think i will sleep now.