speed vc++ vs. c++builder

PDQBach · Apr 18, 2004

Hello,

im a visual c++ und borland c++builder newbie.
i have witten a simple mandelbrot algorithm and compiled it with both
vc++ (mfc) and cbuilder (vcl) (same code besides the drawing part).
the vc++ version is twice! as fast in release mode. in debug mode its
as fast as cbuilder. it seems i cant get cbuilder to compile a real
release version. when i check "Project options:compiler:release" it
even gets slower than debug! i have played around a bit with the
advanced compiler options without any result. i also dropped the
drawing part, supposing that it causes slowdown somehow. the cbuilder
version is not faster than the same code on delphi 7 (maybe the same
problem). what can i do? i cant believe cbuilder (and delphi) to be
that much slower than vc++. i think its just a problem of finding the
right compiler options.

thank you.

code:

for (y=0;y<ymax;y++)
{
for (x=0;x<xmax;x++)
{
cox=x*xscale+leftside;
coy=y*yscale+top;
zx=0;
zy=0;
colorcounter=0;
betrq=0;
zaehler=0;
while (colorcounter<maxiter && betrq<bailout)
{
tempx=zx*zx-zy*zy+cox;
zy=2*zx*zy+coy;
zx=tempx;
colorcounter=colorcounter+1;
betrq=zx*zx+zy*zy;
}

if (betrq<bailout) /*draw black pixel at x,y*/;
else /*draw white pixel at x,y*/);
}
}

system:
vc++: visual studio 6.0
c++builder 6 enterprise
windows xp home sp1
intel pentium m (centrino) 1400mhz

Victor Bazarov · Apr 18, 2004

PDQBach said:
im a visual c++ und borland c++builder newbie.
i have witten a simple mandelbrot algorithm and compiled it with both
vc++ (mfc) and cbuilder (vcl) (same code besides the drawing part).
the vc++ version is twice! as fast in release mode. in debug mode its
as fast as cbuilder. it seems i cant get cbuilder to compile a real
release version. when i check "Project options:compiler:release" it
even gets slower than debug! i have played around a bit with the
advanced compiler options without any result. i also dropped the
drawing part, supposing that it causes slowdown somehow. the cbuilder
version is not faster than the same code on delphi 7 (maybe the same
problem). what can i do? i cant believe cbuilder (and delphi) to be
that much slower than vc++. i think its just a problem of finding the
right compiler options.

Right. And to solve it you need to post to the C++ Builder newsgroup
instead of C++ language one. You don't have a _language_ problem.
Your problem, as you so precisely determined, is in finding the right
compiler options. Try borland.public.cppbuilder.* hierarchy.

Victor

Jerry Coffin · Apr 19, 2004

Hello,

im a visual c++ und borland c++builder newbie.
i have witten a simple mandelbrot algorithm and compiled it with both
vc++ (mfc) and cbuilder (vcl) (same code besides the drawing part).
the vc++ version is twice! as fast in release mode. in debug mode its
as fast as cbuilder.

It's off-topic, but this is fairly typical -- as a general rule,
Borland compilers optimize relatively poorly. With some work, you can
probably improve it somewhat, but chances are it'll remain somewhat
slower anyway.
Later,
Jerry.

PDQBach · Apr 19, 2004

Sorry for beeing offtopic. thanks for the answer.

<but chances are it'll remain somewhat slower anyway.
i wouldnt call twice as fast somewhat. i prefer bcb for the ease of
use but such differences are inacceptable. i get the same results with
the following:

#include "stdafx.h"
#include <time.h>
#include <conio.h>
#include <iostream>
using namespace std;

void main() {
clock_t beg;
double j;
beg = clock();

for (double i=0; i<200000000; ++i) j = i*1000000;
int dif = clock()-beg;
cout << dif << endl;
getch();

i think theres nothing to optimize? how cant bcb optimizer screw up
such simple things? (sorry for continuing an offtopic thread)

Christopher Benson-Manica · Apr 19, 2004

PDQBach <[email protected]> spoke thus:

(followups set)

Sorry for beeing offtopic. thanks for the answer.

I've crossposted this to what I believe to be the appropriate Borland
group, because as noted it's offtopic for clc++, and I as a Borland
user have some interest in the answer. You can read the Borland group
using newsgroups.borland.com as your news server, if the one you're
using now doesn't carry the Borland groups.

(nothing trimmed, although some comments added)

but chances are it'll remain somewhat slower anyway.
[compared to VC++]

Click to expand...

i wouldnt call twice as fast somewhat. i prefer bcb for the ease of
use but such differences are inacceptable. i get the same results with
the following:

#include "stdafx.h"
#include <time.h>
#include <conio.h>
#include <iostream>
using namespace std;

void main() {

^^^^
This should be int for a standard C++ program, although I don't know
what bcc thinks about it.

clock_t beg;
double j;
beg = clock();

for (double i=0; i<200000000; ++i) j = i*1000000;
int dif = clock()-beg;
cout << dif << endl;
getch();

i think theres nothing to optimize? how cant bcb optimizer screw up
such simple things? (sorry for continuing an offtopic thread)

(I don't believe OP specified his BCB version - I'd specifically be
interested in a BCB 4.0 answer.)

Kevin Goodsell · Apr 19, 2004

[Back to comp.lang.c++ for this comment.]

^^^^
This should be int for a standard C++ program, although I don't know
what bcc thinks about it.

If it thinks anything other than "wrong", then it's not
standard-compliant. Unlike C, C++ does not permit alternative
implementation-defined forms for main() where the return type is not
int. In other words, implementation-defined forms are allowed, but they
must return int.

-Kevin

Christopher Benson-Manica · Apr 19, 2004

In comp.lang.c++ Kevin Goodsell said:
If it thinks anything other than "wrong", then it's not
standard-compliant. Unlike C, C++ does not permit alternative
implementation-defined forms for main() where the return type is not
int. In other words, implementation-defined forms are allowed, but they
must return int.

So the parameters but not the return type are up for grabs?

Not to dwell (again) on bcc32, but some of our code has

void CMAIN( int argc, char **argv );

as the main function, through some (dubious?) magic that I don't
necessarily understand; hence my uncertainty regarding this issue.

Kevin Goodsell · Apr 19, 2004

Christopher said:
So the parameters but not the return type are up for grabs?

That seems to be the case. Although, I don't think a diagnostic is
required for an incorrect main() return type. I should have mentioned
that before.

Not to dwell (again) on bcc32, but some of our code has

void CMAIN( int argc, char **argv );

as the main function, through some (dubious?) magic that I don't
necessarily understand; hence my uncertainty regarding this issue.

Scary. Here's hoping that elsewhere in your code, you have something
like this:

int main(int argc, char **argv)
{
CMAIN(argc, argv);

if (some_error_state())
{
return EXIT_FAILURE;
}
else
{
return EXIT_SUCCESS;
}
}

-Kevin

Old Wolf · Apr 19, 2004

Sorry for beeing offtopic. thanks for the answer.

i wouldnt call twice as fast somewhat. i prefer bcb for the ease of
use but such differences are inacceptable. i get the same results with
the following:

#include "stdafx.h"
#include <time.h>
#include <conio.h>
#include <iostream>
using namespace std;

void main() {
clock_t beg;
double j;
beg = clock();

for (double i=0; i<200000000; ++i) j = i*1000000;
int dif = clock()-beg;
cout << dif << endl;
getch();

i think theres nothing to optimize? how cant bcb optimizer screw up
such simple things? (sorry for continuing an offtopic thread)

Let me rewrite your program, removing non-standard code and
removing lines which have no effect (something that an optimiser
would do):

#include <iostream>
#include <ctime>
int main()
{
clock_t beg = clock();
int dif = clock() - beg; /* i'm not sure this is 100% safe */
std::cout << dif << std::endl;
return 0;
}

So I suppose your program is just seeing what the resolution of
the clock function is.

Siemel Naran · Apr 20, 2004

PDQBach said:
void main() {
clock_t beg;
double j;
beg = clock();

for (double i=0; i<200000000; ++i) j = i*1000000;
int dif = clock()-beg;
cout << dif << endl;
getch();

i think theres nothing to optimize? how cant bcb optimizer screw up
such simple things? (sorry for continuing an offtopic thread)

Variable j is not used, so the for loop can be optimized away. Maybe MSVC
does this optimization? Check out the assembly. Try this variation too
where 'j' is used:

cout << j << ' ' << dif << endl;

In addition, I think Borland uses STLPort iostreams, which may not as
optimized as MSVC iostreams.

Jerry Coffin · Apr 20, 2004

(e-mail address removed) (PDQBach) wrote in message
[ a minor rewrite of your code gives: ]

#include <iostream>
#include <ctime>

using namespace std;

int main()
{
double j;
clock_t beg = clock();
for (double i=0; i<200000000; ++i)
j = i*1000000;
double dif = double(clock() - beg)/CLOCKS_PER_SEC;
std::cout << dif << std::endl;
return 0;
}

I believe this should be a bit more portable and consistent.

In any case, looking at the assembly language output for the loop, we
can see the difference pretty easily. Here's what VC++ produces:

$L7396:
fadd QWORD PTR __real@8@3fff8000000000000000
fcom QWORD PTR __real@8@401abebc200000000000
fnstsw ax
test ah, 1
jne SHORT $L7396
fstp ST(0)

But here's what BCC 5.5 produces:

jmp short @4
@3:
fld qword ptr [esp+8]
fmul dword ptr [@5]
fstp st(0)
@6:
fld dword ptr [@5+4]
fadd qword ptr [esp+8]
fstp qword ptr [esp+8]
@4:
fld qword ptr [esp+8]
fcomp dword ptr [@5+8]
fnstsw ax
sahf
jb short @3

Now, even if you don't read Intel assembly language very well, you can
pretty easily see that the VC++ version does NOT include any FMUL
instruction -- i.e. it's not really doing a floating point
multiplication at all. To make a long story short, its output is
basically equivalent to:

for (double i=0; i<limit; ++i)
;

and that's it. The Borland version does pretty much what you asked
for: it does the multiplication inside of the loop, thus slowing
things down substantially.

This sort of thing tends to have a much smaller effect on real code
than on synthetic benchmarks like this; as a rule, you won't do 200
million multiplications unless you actually have some use for the
results they produce. When/if you use the results, VC++ will probably
have do the multiplications as well, and slow down substantially.

i think theres nothing to optimize? how cant bcb optimizer screw up
such simple things? (sorry for continuing an offtopic thread)

As you can see above, there's really quite a bit that's open to
optimization here -- but it probably wouldn't be with real code that
was otherwise similar.

Just for example, here's the same basic code, but modified to use
(i.e. print out) the result generated inside the loop:

#include <iostream>
#include <ctime>

using namespace std;

int main()
{
double j;
double total = 0.0;
const double N = 200000000;
clock_t beg = clock();
for (double i=1; i<N; ++i)
total += 1.0/i;
double dif = double(clock() - beg)/CLOCKS_PER_SEC;
std::cout << "result: " << total << std::endl;
std::cout << dif << std::endl;
return 0;
}

With this, the difference I get is much smaller -- 2.734 seconds for
VC++ and 3.109 seconds for BCC 5.5.

That pretty much fits my earlier prediction: Borland does produce
slower output, but not by such a huge margin as to render it unusable.
They've evened out a lot because now they're at least doing the same
problem. The difference is that VC++ explicitly keeps most of what
it's working with in floating point registers, while BC++ loads a
value from memory, operates on it, and then stores the result back to
memory each iteration. The cache keeps this from being excruciatingly
slow, but even L1 cache is still slower than using a register
directly.

In case you care, the result this prints out is the Nth harmonic
number. Harmonic numbers are related to a number of interesting
questions. If you assign each card a length of 2, then N iterations
will tell you the length of overhang for N cards. With N=200000000,
we get an overhang of almost 10 complete cards in length -- but
assuming the cards are about the normal thickness, the stack would be
the tallest thing on earth, by quite a large margin -- you'd need
pretty old, thin cards to get it down to quadruple the height of Mt.
Everest!
Later,
Jerry.

PDQBach · Apr 20, 2004

Let me rewrite your program, removing non-standard code and

removing lines which have no effect (something that an optimiser
would do):

for (double i=0; i<200000000; ++i) j = i*1000000;
why did you remove this? it has an effekt because j is changing each
time and for this reason vc++ doesnt remove too.
i just wanted to compare the multiplikation speed of vc++ and bcb (to
find out the reason why the upmentioned progamm is so slow on bcb).

Michiel Salters · Apr 20, 2004

Kevin Goodsell said:
Scary. Here's hoping that elsewhere in your code, you have something
like this:

int main(int argc, char **argv)
{
CMAIN(argc, argv);

....

or some evil macro that expands to foo() { }; int main

I haven't seen cases where void main was an advantage, altough
different argument lists make sense (e.g. the envp extension).

Regards,
Michiel Salters

Siemel Naran · Apr 20, 2004

Siemel Naran said:
Variable j is not used, so the for loop can be optimized away. Maybe MSVC
does this optimization? Check out the assembly. Try this variation too
where 'j' is used:

cout << j << ' ' << dif << endl;

Was thinking, the above may not be correct. Sure 'j' is now used in side
effects. But a super-smart compiler will see that the bounds of the for
loop

are known at compile time, and the body of the for loop is builtin math. So
it may evaluate the expression at compile time, and replace

cout << j << ' ' << dif << endl;

with

cout << 1.086e88 or whatever it is << ' ' << dif << endl;

So replace

with

std::ifstream file("file.txt"); // contains "200000000"
int N;
file >> N;
for (double i=0; i<N; ++i) j = i*1000000;

Back in 2000, I think the KAI C++ compiler did these kinds of optimizations,
even if the body of the for loop invoked standard functions like std::sin
and std::strlen and the function arguments were could be known at compile
time.

In addition, I think Borland uses STLPort iostreams, which may not as
optimized as MSVC iostreams.

Try using printf.

PDQBach · Apr 20, 2004

however, in the upmentioned (incomplete, sorry) mandelbrot algorithm,
there should be real floating point operations. i finished another
program to quickly color the inside of the mset according to the
periodicity of the point and have the same results. vc++ is again
twice as fast. in release mode bcb still is slower than in debug mode.
btw does anyone now an easy guessing algorithm for the mandelbrotset?
i could make my new program much faster (at the moment with vc++ its 3
times faster than ultrafractal (guessing turned off).) the trick is to
adapt the maximum iteration (easily up to 100000) if no period was
found at lower iteration and then continue iteration from the last z.
im not sure wether a guessing algorithm could be implemented with this
trick.

Bruce · Apr 21, 2004

In comp.lang.c++

i just wanted to compare the multiplikation speed of vc++ and bcb (to
find out the reason why the upmentioned progamm is so slow on bcb).

Then disable optimizations and be done with it.

Old Wolf · Apr 22, 2004

for (double i=0; i<200000000; ++i) j = i*1000000;
why did you remove this? it has an effekt because j is changing each
time and for this reason vc++ doesnt remove too.

'j' changing, does not count as an effect. Your code is like:
j = 1 * 1000000;
j = 2 * 1000000;
j = 3 * 1000000;
and any sane optimiser would remove all of these statements except the
final one. Then, 'j' is not used later on in the program either, so the
optimiser would remove it entirely.

i just wanted to compare the multiplikation speed of vc++ and bcb (to
find out the reason why the upmentioned progamm is so slow on bcb).

You should look at the assembly generated by each for that loop.
That is the only reliable way to check that you are comparing apples
with apples. It will also tell you what each compiler does differently.
I also suggest you read the manuals for your compiler options. Likely
options include 80686 code generation, and fast floating point.

For example, VC could be using a register for 'j' and BCC might be using
memory, which would certainly account for the discrepancy. Also, if BCC
is in 386 mode (the default) then it might not be using the latest CPU
multiplication instructions available.

PDQBach · Apr 22, 2004

and any sane optimiser would remove all of these statements except the

final one.

well, in my casevc++ seems to do something (but doesnt remove the
useless statements). just try it, if you own both vc++ and bcb 6. but
there are indeed situations where vc++ cuts out statements and bcb
doesnt. however, i decided to use vc++ now, because most of my
applications are time-critical. but, besides speed, vc++ 6 is quite a
mess compared to bcb (from the beginners standpoint...). as a non
professional programmer i would like a language that, in most parts,
is as simple as basic and in other, time-critical parts, allows to be
more complicated and faster. something like an hybrid language.

Old template class works in VC++ Not in C++ Builder 5	3	Nov 30, 2006
program wont compile...	5	Feb 3, 2006

speed vc++ vs. c++builder

PDQBach

Victor Bazarov

Jerry Coffin

PDQBach

Christopher Benson-Manica

Kevin Goodsell

Christopher Benson-Manica

Kevin Goodsell

Old Wolf

Siemel Naran

Jerry Coffin

PDQBach

Michiel Salters

Siemel Naran

PDQBach

Bruce

Old Wolf

PDQBach

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads