slow complex<double>'s

G

Greg Buchholz

/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...


http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

....I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2

....I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

#include<iostream>
#include<complex>

int main (int argc, char **argv)
{
char bit_num = 0, byte_acc = 0;
const int iter = 50;
const double limit_sqr = 2.0 * 2.0;

std::ios_base::sync_with_stdio(false);
int n = atoi(argv[1]);

std::cout << "P4\n" << n << " " << n << std::endl;

for(int y=0; y<n; ++y)
for(int x=0; x<n; ++x)
{
std::complex<double> Z(0.0,0.0);
std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
1.0);

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
0; }
else if(x == n-1) { byte_acc <<= (8-n%8);
std::cout << byte_acc;
bit_num = byte_acc = 0; }
}
}
 
J

Jerry Coffin

@z34g2000cwc.googlegroups.com>,
(e-mail address removed) says...

[ ... ]
...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

[ ... ]
for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.
 
G

Greg Buchholz

Jerry said:
Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.

Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Greg Buchholz
 
F

Fei Liu

Greg said:
Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Greg Buchholz

profile your code and find out what's causing the slowdown...
 
P

peter koch

Greg said:
/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...


http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc&id=2

...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/ [snip]

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);
[snip]

Could it be that some large time is spent in calculating "norm"? I
doubt that the C-version does so - and it should not be necesarry. Some
simple test should fit the bill (or at least avoid calling norm all the
time).

/Peter
 
J

Jerry Coffin

@e56g2000cwe.googlegroups.com>,
(e-mail address removed) says...
Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Well, that takes out the possibility that seemed most
obvious to me. Depending on your bent, your next step
would be either a profiler or examining the code the
compiler's producing. For large chunks of code, the
former works well, but for small amounts that you want to
examine in maximum detail the latter can be useful as
well.
 
G

Greg Buchholz

Greg said:
...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++.

Looks like the problem can be solved by manually inlining the
definition of "norm"...

//manually inlining "norm" results in a 5x-7x speedup on g++
for(int i=0; i<iter and
(Z.real()*Z.real() + Z.imag()*Z.imag()) <= limit_sqr; ++i)
Z = Z*Z + C;

....For some reason g++ must not have been able to inline it (or does so
after common subexpression elimination or somesuch).

Greg Buchholz
 
B

Bill Shortall

Greg Buchholz said:
/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...


http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=all

...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp4/benchmark.php?test=mandelbrot&lang=gcc
&id=2

...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

#include<iostream>
#include<complex>

int main (int argc, char **argv)
{
char bit_num = 0, byte_acc = 0;
const int iter = 50;
const double limit_sqr = 2.0 * 2.0;

std::ios_base::sync_with_stdio(false);
int n = atoi(argv[1]);

std::cout << "P4\n" << n << " " << n << std::endl;

for(int y=0; y<n; ++y)
for(int x=0; x<n; ++x)
{
std::complex<double> Z(0.0,0.0);
std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
1.0);

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
0; }
else if(x == n-1) { byte_acc <<= (8-n%8);
std::cout << byte_acc;
bit_num = byte_acc = 0; }
}
}

------------------------------------------------
Hi Greg,
I have had similiar problems when
using the <complex> library for
Microsoft VC6. It ran at about half the
expected speed . After looking thru
the header file I saw that the C++
structure was somewhat involved with
a base class and several derived classes.
I ended up writing my own very simple
complex class looking like

namespace std
{
template <class Tc>
class ppcomplex
{
public:
Tc re;
Tc im;

ppcomplex(){re = 0;im = 0;}
ppcomplex(const Tc& r,const Tc& i) : re(r), im(i) {}
ppcomplex(const Tc& r) : re(r), im((Tc)0) {}

Tc real() const { return re;}
Tc imag() const { return im;}

Tc real(const Tc& x) { return ( re = x):}
Tc imag(const Tc& x) { return ( im = x):}
// the usual assignment operators
ppcomplex(const ppcomplex<Tc>& z)
{this->re = z.re; this->im = z.im;}
ppcomplex<Tc>& operator =(const ppcomplex<Tc>& y) {
if(this != &y)
{this->re = y.re; this->im = y.im;} return *this; }
ppcomplex<Tc>& operator =(const Tc& r)
{ this->re = r, this->im = (Tc)0; return *this;}

etc --- etc ---etc more stuff here

// updating by a real constant

ppcomplex<Tc>& operator +=(const Tc& y)
{ re += y; return *this;}

// more stuff

}; // end of class ppcomplex

This ran twice as fast ! so maybe you have the same problem i.e. your
complex class is just too complicated ?

Regards....Bill
 
M

Marcus Kwok

Bill Shortall said:
namespace std
{
template <class Tc>
class ppcomplex
{

You are not allowed to introduce your own names to namespace std. IIRC,
you are only allowed to add specializations of the standard template
classes, when specializing on user-defined classes.
 
B

Bill Shortall

Marcus Kwok said:
You are not allowed to introduce your own names to namespace std. IIRC,
you are only allowed to add specializations of the standard template
classes, when specializing on user-defined classes.

OK -- change ppcomplex to complex
call it's header file <complex> and
remove the old one
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top