M
Michele Guidolin
Hello to everybody.
I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.
Here are the time of test with different grid SIZE and type:
SIZE 128 256 512
float 2.20s 2.76s 7.86s
double 2.30s 2.47s 2.59s
As you can see when the grid has a size of 256 node the code with float
type increase the time drastically.
What could be the problem? could be the cache? Should the float
computation always fastest than double?
Hope to receive an answer as soon as possible,
Thanks
Michele Guidolin.
P.S.
Here are some more information about the test:
The code that I'm testing is this and it is the same for the double
version (the constant are not 0.25f but 0.25).
------------- CODE -------------
float u[SIZE][SIZE];
float rhs[SIZE][SIZE];
inline void gs_relax(int i,int j)
{
u[j] = ( rhs[j] +
0.0f * u[j] +
0.25f* u[i+1][j]+
0.25f* u[i-1][j]+
0.25f* u[j+1]+
0.25f* u[j-1]);
}
void gs_step_fusion()
{
int i,j;
/* update the red points:
*/
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(1,j);
}
for(i=2; i<SIZE-1; i++)
{
for(j=1+(i+1)%2; j<SIZE-1; j=j+2)
{
gs_relax(i,j);
gs_relax(i-1,j);
}
}
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(SIZE-2,j);
}
}
---------------CODE--------------
I'm testing this code on this machine:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping : 1
cpu MHz : 3192.311
cache size : 1024 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl cid
bogomips : 6324.22
with Hyper threading enable on Linux 2.6.8.
The compiler is gcc 3.4.4 and the flags are:
CFLAGS = -g -O2 -funroll-loops -msse2 -march=pentium4 -Wall
I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
dimensional grid of different size and type, I have some strange result
when I change the computation from double to float.
Here are the time of test with different grid SIZE and type:
SIZE 128 256 512
float 2.20s 2.76s 7.86s
double 2.30s 2.47s 2.59s
As you can see when the grid has a size of 256 node the code with float
type increase the time drastically.
What could be the problem? could be the cache? Should the float
computation always fastest than double?
Hope to receive an answer as soon as possible,
Thanks
Michele Guidolin.
P.S.
Here are some more information about the test:
The code that I'm testing is this and it is the same for the double
version (the constant are not 0.25f but 0.25).
------------- CODE -------------
float u[SIZE][SIZE];
float rhs[SIZE][SIZE];
inline void gs_relax(int i,int j)
{
u[j] = ( rhs[j] +
0.0f * u[j] +
0.25f* u[i+1][j]+
0.25f* u[i-1][j]+
0.25f* u[j+1]+
0.25f* u[j-1]);
}
void gs_step_fusion()
{
int i,j;
/* update the red points:
*/
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(1,j);
}
for(i=2; i<SIZE-1; i++)
{
for(j=1+(i+1)%2; j<SIZE-1; j=j+2)
{
gs_relax(i,j);
gs_relax(i-1,j);
}
}
for(j=1; j<SIZE-1; j=j+2)
{
gs_relax(SIZE-2,j);
}
}
---------------CODE--------------
I'm testing this code on this machine:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping : 1
cpu MHz : 3192.311
cache size : 1024 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
monitor ds_cpl cid
bogomips : 6324.22
with Hyper threading enable on Linux 2.6.8.
The compiler is gcc 3.4.4 and the flags are:
CFLAGS = -g -O2 -funroll-loops -msse2 -march=pentium4 -Wall