S
Sean G. McLaughlin
And this tidbit makes Gentoo Linux just that much funnier.Niz said:Although as others have said, the Intel compiler is king of the hill
for producing fast Intel x86 and x86_64 code.
And this tidbit makes Gentoo Linux just that much funnier.Niz said:Although as others have said, the Intel compiler is king of the hill
for producing fast Intel x86 and x86_64 code.
Walter Roberson said:In past posts, people have said that Comeau's compiler with
the Dinkumware libraries are true C99. Certainly dinkumware.com
advertises their library as being fully conforming to standard C99.
[QUOTE="jacob navia said:So Gcc is not likely to be good on any platform as the people who
develop for other platforms will specialise in that and beat GCC?
CRAP
GCC implements GNU-c and adds extensions for SOME parts of C99
Obviously you are right with "Some parts of C99" but those "some parts"
are almost 99% of the job...
I disagree. Microsoft has done no effort at all to implemnt C99. The only
parts they did was // comments and accepting "long long". I am not
aware of any other parts of C99 that they implement.
It is more advanced in its implementation of C99 than lcc-win.
In gcc 4.1.2, with -O3 , on a 686,
the whole function is elimated and inlined,
leading to :
$ time ./a.out
times=1000000000 loops=20 dtime = 0
real 0m0.001s
user 0m0.000s
sys 0m0.003s
$
In this case, there is no difference in generated code when
-march=i686 -msse2 are added to the -O3 flag.
I guess, you'll have to invent a better benchmark
AvK
In gcc 4.1.2, with -O3 , on a 686,
the whole function is elimated and inlined,
leading to :
$ time ./a.out
times=1000000000 loops=20 dtime = 0
real 0m0.001s
user 0m0.000s
sys 0m0.003s
$
In this case, there is no difference in generated code when
-march=i686 -msse2 are added to the -O3 flag.
I guess, you'll have to invent a better benchmark
AvK
I cannot believe such a blatant difference will go unnoticed for long
guys:
i have zeroed in and created a simple test program. This progrma just
has floating point addition and integer addition. it does 20 loops x
1million times. in relase version of visual c it takes 0 time. gcc O3
takes 6 secs in my machine.
this cannot be rocket science; there seems to some fundamental
deficiency in gcc. i will treat this as a bug. This should have
serious implications for linux platforms
here is the code; test it for yourself
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
static double loop (long times)
{
long i=0;
double a=0;
for (i=1; i<times; i++)
{
double x1 = i-1;
double x2 = i;
double y = 0;
long n=0;
y = x1+x2;
n = i+ i -1;
y=x1*x2*y;
a=y;
}
return a;
}
int main (int argc, char **argv)
{
unsigned long times = 0;
long i=0;
time_t t=0;
time_t t1=0;
double dt=0;
long lcnt=20;
double a=0;
times = (long) (1e9);
/*
if(argc > 1)
{
times = atoi (argv[1]);
times *= 1e6;
}
if(argc > 2)
lcnt = atoi (argv[2]);
if(lcnt < 20)
lcnt = 20;
*/
time (&t);
for (i=0; i<20; i++)
{
a = loop (times);
/* you need this for visual c show any elapsed time
printf ("\n %lg \n", a);
[email protected] said:I recently compiled a numerically intensive c project under cygwin
gcc 3.4.4 and microsoft visual c. ...
... the most surprising thing was visual c optimized was 2x
performance over gcc optimized.
is anybody else seeing the same thing. if this is true microsoft c
compiler is in a different league altogether
I guess you are comparing a current Microsoft compiler (with OpenMPPeter said:Why the surprise? GNU's gcc is intended to be a fast compiler.
It was not designed to produce ultra-fast executables.
Barry said:In my very limited experience, optimizers tend to focus on statements
as opposed to declarations. By placing these declarations in the loop
and performing non-trivial computations in the initialization, you
have added an unnecessary "extra level of confusion" to your primary
objective. What happens if you define x1 etc at function scope and
use simple assignment statements here? The initialization for n and y
is superfluous.
If you just stand back and look at that time-waster statement, you
will see that x1 is discarded after the loop and never used, and
that i is set to times. Therefore the optimizer can simply
generate:
i = times;
for the whole loop,
ok that was the behavior in visual c. i was using cygwin , i will test
it out in my ubuntu hardy box. thanx
2. I now compared by original program again visual c (release -- uses
ms -O2 option) and gcc with -O3 -mtune=core2 -march=core2 -msse4.
Visual c is faster by 2.5x!!
3. I try a switch in my program which deploys a different floating
point algorithm. This algorithm is dominated by floating point
additions as opposed to multiplications in the 'standard' program.
The vc performance does not change. The gcc performance deteriorates
and it is now 3.5x slower than visual c.
4. I will try to create a simpler test program to represent the above
behavior. My belief is the difference has something to do with the
floating point
NT4 ran on a DEC Alpha as well. Also don't forget PowerPC in their XBOX!
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.