B
Bo Persson
James Kanze wrote:
:
:: :
: [...]
:: I did some tests on my system and had dismal results also, but
:: in release things are a bit better, but not much.
:
:: Using your algorithm the array based was 6 seconds, the vector
:: based was 29 seconds.
:
:: Using vectors with the array based algorithm
:: arradd(&av[0], &av[0], &bv[0], 4);
:: is 16 seconds.
:
:: I tried all the optmizations I could find but couldn't get an
:: improvement over these values.
:
: Which compilers, and which options?
:
: I gave the complete version he posted a quick try on the systems
: I have at hand (a PC with VC++ 2005 and g++ 3.3.3---the CygWin
: port, but I don't think that affects CPU bound programs much---,
: an Intel PC with Linux and g++ 4.1.0, and a Sun Sparc with g++
: 4.1.0 and Sun CC 5.8---Sun CC comes with two different library
: implementations). The results are interesting, to say the
: least, and do indicate just how much the answer is
: implementation dependent. I multiplied the loop count by 10 to
: get measurable times. My compiler options were:
:
: VC++: cl /vmg /GR /EHs /D_SECURE_SCL=0 /O2
: g++: g++ -O3
: Sun CC: CC -O4 and CC -library=stlport4 -O4
:
: The results are interesting, to say the least:
:
: array vector
: Windows PC:
: VC++ 22 22
: g++ 30 38
:
: Linux PC:
: g++ 33 42
:
: Sun Sparc:
: g++ 73 77
: Sun standard 60 172
: Sun stlport 60 165
:
I just have to give another data point, showing that we *always* have
to measure to know for sure.
Again, VC++ 2005, but with an alternate standard library
implementation:
Test 1: 1000 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 23 seconds
Wall time elapsed: 24 seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01
Test 2: 1000 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 14 seconds
Wall time elapsed: 14seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01
For some reason, the compiler here decides that it is proper to fully
inline the vecadd function, saving a call and the parameter passing.
On the other hand, it decides to call arradd() out-of-line while
pushing and popping parameters.
A negative abstraction penalty?
Bo Persson
:
:: :
: [...]
:: I did some tests on my system and had dismal results also, but
:: in release things are a bit better, but not much.
:
:: Using your algorithm the array based was 6 seconds, the vector
:: based was 29 seconds.
:
:: Using vectors with the array based algorithm
:: arradd(&av[0], &av[0], &bv[0], 4);
:: is 16 seconds.
:
:: I tried all the optmizations I could find but couldn't get an
:: improvement over these values.
:
: Which compilers, and which options?
:
: I gave the complete version he posted a quick try on the systems
: I have at hand (a PC with VC++ 2005 and g++ 3.3.3---the CygWin
: port, but I don't think that affects CPU bound programs much---,
: an Intel PC with Linux and g++ 4.1.0, and a Sun Sparc with g++
: 4.1.0 and Sun CC 5.8---Sun CC comes with two different library
: implementations). The results are interesting, to say the
: least, and do indicate just how much the answer is
: implementation dependent. I multiplied the loop count by 10 to
: get measurable times. My compiler options were:
:
: VC++: cl /vmg /GR /EHs /D_SECURE_SCL=0 /O2
: g++: g++ -O3
: Sun CC: CC -O4 and CC -library=stlport4 -O4
:
: The results are interesting, to say the least:
:
: array vector
: Windows PC:
: VC++ 22 22
: g++ 30 38
:
: Linux PC:
: g++ 33 42
:
: Sun Sparc:
: g++ 73 77
: Sun standard 60 172
: Sun stlport 60 165
:
I just have to give another data point, showing that we *always* have
to measure to know for sure.
Again, VC++ 2005, but with an alternate standard library
implementation:
Test 1: 1000 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 23 seconds
Wall time elapsed: 24 seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01
Test 2: 1000 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 14 seconds
Wall time elapsed: 14seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01
For some reason, the compiler here decides that it is proper to fully
inline the vecadd function, saving a call and the parameter passing.
On the other hand, it decides to call arradd() out-of-line while
pushing and popping parameters.
A negative abstraction penalty?
Bo Persson