g++ loop unrolling performance

=?ISO-8859-1?Q?Per_Nordl=F6w?= · Aug 31, 2004

Hi all

I am using the boost::array template class trying to generalize my
handcrafted
vector specialization for dimensions 2 (class vec2), 3 (class vec3) etc.

As performance is of greatest importance I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations
can be determined at compile time or upon entry to the loop. The gcc switch
"-funroll-loops" should do just that. The test program calculates the
dotproduct of two four-dimensional arrays of int 10 million times and
looks like follows:

#include "../array.hh"
#include "../Timer.hh"

using boost::array;
using std::cout;
using std::endl;

template <typename T, std::size_t N>
inline T general_dot(const array<T, N> & a, const array<T, N> & b)
{
T c = 0;
for (size_t i = 0; i < N; i++)
{
c += a * b;
}
return c;
}

template <typename T>
inline T special_dot(const array<T, 4> & a, const array<T, 4> & b)
{
return (a[0] * b[0] +
a[1] * b[1] +
a[2] * b[2] +
a[3] * b[3]);
}

int main(int argc, char * argv[])
{
typedef array<int, 4> T;

T a(3);

cout << "a: " << a << endl;

a[0] = 11;
a[1] = 13;
a[2] = 17;
a[3] = 19;

cout << "a: " << a << endl;

T b = a;

Timer t;

const unsigned int nloops = 10000000;

unsigned int sum = 0;
t.reset();
for (unsigned int i = 0; i < nloops; i++)
{
sum += general_dot(a, b);
}
t.read();
cout << "general: " << t << endl;

unsigned int tum = 0;
t.reset();
for (unsigned int i = 0; i < nloops; i++)
{
tum += special_dot(a, b);
}
t.read();
cout << "special: " << t << endl;

if (sum == tum)
{
cout << "Checksums are equal. OK" << endl;
}
else
{
cout << "Checksums are not equal. NOT OK" << endl;
}

return 0;
}

The calculation is performed with a general and a specialized version of
the dot product: general_dot() and special_dot() respectively.

However the performance of the general_dot() is terrible compared to the
special_dot(). Around 35 times slower when I compile it with gcc-3.3.2 using
the switches "-O3 -funroll-all-loops".

Is gcc really that lame or have I forgotten something?

Many thanks in advance,

Per Nordlöw
Swedish Defence Research Agency
Linköping
Sweden

Jack Klein · Sep 1, 2004

Hi all

I am using the boost::array template class trying to generalize my
handcrafted
vector specialization for dimensions 2 (class vec2), 3 (class vec3) etc.

As performance is of greatest importance I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations
can be determined at compile time or upon entry to the loop. The gcc switch
"-funroll-loops" should do just that. The test program calculates the
dotproduct of two four-dimensional arrays of int 10 million times and
looks like follows:
[snip]

The calculation is performed with a general and a specialized version of
the dot product: general_dot() and special_dot() respectively.

However the performance of the general_dot() is terrible compared to the
special_dot(). Around 35 times slower when I compile it with gcc-3.3.2 using
the switches "-O3 -funroll-all-loops".

Is gcc really that lame or have I forgotten something?

Questions about gcc and specific options should be addressed to one of
the groups. The C++ language does not define
optimization options at all, not to mention those of specific
compilers.

Compiler Performance with Compile-Time Array?	2	Oct 27, 2012
Custom matrix multiplication produces different results to glm	0	Sep 16, 2023
I need help	1	Nov 2, 2022
Cannot find my infinite loop	1	Sep 23, 2023
Unrolling loops using templates	6	Mar 3, 2011
Could you fix my code please? (i get no output after inputing the number)	1	Oct 16, 2023
Infinite loop problem	1	Nov 4, 2023
First time question	1	Dec 13, 2022

g++ loop unrolling performance

=?ISO-8859-1?Q?Per_Nordl=F6w?=

Jack Klein

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads