size_t

assaarpa · Sep 20, 2004

IRIX 6.5.21 / Octane R10000 250 Mhz / g++ 3.4.0:

std::list
push_back() 149.844 ms, sum: -1474736480
iteration 24.733 ms, sum: -1474736480

core::list
push_back() 35.464 ms, sum: -1474736480
iteration 20.326 ms, sum: -1474736480

std::vector
push_back() 16.166 ms, sum: -1474736480
iteration 2.652 ms, sum: -1474736480

core::vector
push_back() 20.698 ms, sum: -1474736480
iteration 2.188 ms, sum: -1474736480

That said, the test is very primitive and does test only insertion and
iteration performance but those are very commonly used operations. The test
pattern doesn't reflect real-world use very well (only push_back and then
iterate, what, 20,000 times in a row? Or was it 200,000 times?

So I
acknowledge that the test isn't very good one but those are the results
whatever the quality of the test was.

Peter Koch Larsen · Sep 20, 2004

"assaarpa" <[email protected]> skrev i en meddelelse
[snip]

I had the same sentiments, the test is pretty trivial and indeed didn't try
splice neither I implemented splice the feature that were introduced in
C++2003 (could be wrong, all the same these methods are not implemented
are commented out as "TODO" section in the header.

[snip]

The standard container is balanced in order to make all operations
reasonably fast. Thus i would like to have seen more tests. (although splice
is perhaps not one of the operations i would care about)

Here's output from Windows XP Professional SP1 / Athlon64 3000+ / Visual C++
.NET 2003:

[snip]

Far more interesting is the project settings. Are you running debug code?
What optimizations are involved? Running in debug mode could easily penalise
std::list more than assaarpa::list.

/Peter

assaarpa · Sep 20, 2004

Far more interesting is the project settings. Are you running debug code?

What optimizations are involved? Running in debug mode could easily penalise
std::list more than assaarpa::list.

I never profile debug builds. For each compiler the settings are the
defaults and only few settings tuned to favour speed (/O3, /Ox, or similiar
depending what compiler was used) with very little furry dices or other
spices thrown in.

Here's the Bill Gates settings:

/Ox /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /FD /EHsc /ML /GS /FAs
/Fa"Release/" /Fo"Release/" /Fd"Release/vc70.pdb" /W3 /nologo /c /Wp64 /Zi
/TP

(that conflicts with what I wrote above, because that is copy-pasted from
Command Line -output window of the solution settings, only change I did was
to switch O2 to Ox after creating a fresh solution).

g++ settings:
-g -Wall -I$(incbase) -O3 $(libs)

So in practise only -O3 that directly affects code generator, if there are
any specific settings you would like me to try please post them.

Ioannis Vranos · Sep 20, 2004

assaarpa said:
I never profile debug builds. For each compiler the settings are the
defaults and only few settings tuned to favour speed (/O3, /Ox, or similiar
depending what compiler was used) with very little furry dices or other
spices thrown in.

Here's the Bill Gates settings:

/Ox /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /FD /EHsc /ML /GS /FAs
/Fa"Release/" /Fo"Release/" /Fd"Release/vc70.pdb" /W3 /nologo /c /Wp64 /Zi
/TP

/O2 is the best for speed. Also use /Og /Oi /Ot /Oy /GA.

So in summary use these:

C:\c>cl /O2 /Og /Oi /Ot /Oy /GA /EHsc temp.cpp

And add /clr if you want the code to be IL.

(that conflicts with what I wrote above, because that is copy-pasted from
Command Line -output window of the solution settings, only change I did was
to switch O2 to Ox after creating a fresh solution).

g++ settings:
-g -Wall -I$(incbase) -O3 $(libs)

So in practise only -O3 that directly affects code generator, if there are
any specific settings you would like me to try please post them.

This is what I use:

C:\MinGW\bin\g++.exe -std=c++98 -pedantic-errors -Wall
-fexpensive-optimizations -O3 -ffloat-store -mcpu=pentiumpro temp.cpp -o
temp.exe

assaarpa · Sep 20, 2004

C:\MinGW\bin\g++.exe -std=c++98 -pedantic-errors -Wall

-fexpensive-optimizations -O3 -ffloat-store -mcpu=pentiumpro temp.cpp -o
temp.exe

-mcpu is deprecated on g++ 3.4.2, omitting that, the first four:

54.5 ms
2.7 ms
29.5 ms
2.3 ms

The compiler options cannot do very much to catch up the difference, because
I *cheated*, normally if you want to allocate internal nodes for the linked
list you would use the allocator but the core::list has internal block
allocator which stores the allocated blocks in this format:

struct foo
{
foo* next; // link to next delete[] call when we want to get rid of the
memory

bar* data; // information for delete[]
};

This contributes to reduced memory footprint, I know this is platform
dependent but let's look at how things work with Visual C++ .NET 2003 on
Windows, when we allocate memory using malloc or new (assuming not
overloaded) the block size for a single memory allocation is 8 bytes. This
is the minimum amount of memory that can be allocated. Overhead per
allocated block is 24 bytes. Our iterator node fits easily within 8 bytes
since it only stores xor of the next and prev pointers, actually both next
and prev in this platform would fit into single allocation block, totaling
32 bytes consumed per node.

Since we allocate in lots of atleast 16 blocks, this means the 24 bytes is
ditributed evenly over each node so the memory footprint is the mandatory 4
bytes for xor, and 1.5 bytes for the memory allocation overhead so we spend
5.5 bytes of memory per node, this is roughly 6-to-1 saving in memory
footpring for the linked list management. Additionally, the nodes are
initially placed in memory in sequential locations which is very cache
friendly (it is also very cache friendly that the iteration needs only 1/6
of unique memory addreses to go through when iterating!)

In this scenario saving of merely FOUR BYTES (one pointer) means 50% saving
in memory footpring! 5.5 vs. 9.5 bytes per node! If we did not have the
built-in block allocation of nodes the impact of having two separate, unique
pointers would be neglible. Now that we know this, we also realize that
having a custom allocator would be only HALF AS EFFICIENT as one that we
have integrated into the list class itself (compare the memory footprints
for both cases, xor and next/prev strategy).

I just got this idea one day and figured it worth trying after a little
consideration. Making improvements to vector is not very easy, since my code
seems to be consistently to be 1% or so slower (I am using specialization to
detect if type is POD, et cetera to give little extra boost, don't do
placement new or such kept it very simple). I also notice that the std::map
implementation is very good, atleast as good as RB-tree, clearly not AVL
tree (if it is on std::map implementations all more the power to the
implementations as I generally found RB tree to be better than AVL tree but
I haven't really looked into that very much just compared performances a
little bit with very basic tree implementations). p.s. I am fully aware that
std::map doesn't imply tree but that is the implementation that would come
to my mind first when reading the contract the interfaces makes with the
client regarding performance, I could be totally wrong ofcourse so just
thinking "aloud" the thoughts are not very coherent anymore excuse me i
think i will sleep now.

std::size_t	6	Dec 18, 2010
size_t Question	9	Jan 23, 2010
size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
Overflow of size_t?	9	Jul 3, 2009
Chatbot	0	Oct 8, 2024
Mixing size_t and other types	43	Aug 1, 2010
size_t in inttypes.h	4	May 26, 2011
size_t in a struct	24	May 20, 2011

size_t

assaarpa

Peter Koch Larsen

assaarpa

Ioannis Vranos

assaarpa

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads