simple code performance question

?

=?iso-8859-1?q?Elias_Salom=E3o_Helou_Neto?=

(Strange, I didn't have any problem with his wording. But I'm
not sure any of us are native speakers here, although I did grow
up in America.)

Well, shame on me. My reading was awful and the wording was ok. Sorry
about that.
By definition, the difference between a constructor and an
assignment operator is that the assignment operator has an old
value, which must be taken into consideration. Sometimes,
taking it into consideration can actually improve performance;
e.g. if you are using deep copy, and the destination string is
smaller than the source. Other times, it can slow things down:
anytime it's not big enough, for example, and so must be freed,
and re-allocated, or if you're using copy on write.

The fact is that destructors will have to deal with the old value
anyway when iterating over a loop.
Depending on what you're doing with std::string, copy on write
can represent a significant performance gain, or a small (or
maybe not so small, in the case of multi-threaded code) loss.
The actual measurements were made some time ago, with g++
2.95.2, which didn't support threading at all, and had a small,
simple and very, very robust implementation of basic_string. My
own experiments suggest that they probably made the right trade
off using copy on write in this case, at least for my code.


That's what g++ does. It still means that you have to do
something with the old data.

Well, the fact that assignement is replaced by a full construction/
destruction cycle within the loop doesn't quite remedy that. What does
the destructor do with the data? There is no way around it unless we
do not want to release memory.
And I ask you how much experience you've actually had compiling
and optimizing C++, in order to make such a blanket statement.

I work designing and implementing heavy algorithms for image
reconstruction in tomography (I also prove they converge). And I have
been using C++ for a while, for Matlab was becoming too slow. One
thing I have learned is that if your class carries lots of data and
you don't want to go through implementing reference counted copy on
write, copy constructors should better be avoided, specially the
implicit ones, but this is not the point.
The standard contains a couple of special rules, just to permit
optimization of temporaries when constructing. They don't
necessarily apply when copying.

I guess you mean "when assigning", right? This is just the reason why
I say that with str( someFunction() ) wthe compiler removes
temporaries easier than in str = someFunction(). But if named objects
can't be optimized away (it would be handy if you could point me to
the "couple of special rules" in the standard), we can't even always
apply RVO when the function may do something very complex with the
object. How could I create the string of my last posting within a
return statement? Not to mention more complicated cases.
It's expecting automatically that one idiom will be faster than
the other, without actually having measured the specific case in
question, which is dumb. It's choosing a particular idiom on
the grounds that it will be faster without having measured.

Of course it is, but people here seems to religiously prefer T
t( someFunction() ) over someFunction( t ) even within tight loops and
for absolutely every class. So, are they dumb?
The problem is that you still don't understand. To start with,
your version doesn't use RVO, because there is no return value.

That was a typo. RVO should not be there.
And anyone with any real experience will automatically disagree
when you claim differences without actually having measured, and
will disagree that the measurements you made for one case apply
to the next.

But you measured, right? I was trying to understand the reasons for
something that seemed unlikely to happen, at least from my point of
view.

In fact,I am still not convinced, as I have never actually seen an
example in which "repeated construction/destruction cycles" could not
be replaced with performance gain by "creating outside the loop and
passing the object as reference". There may be, I look forward to see,
but have never seen. Perhaps, with your knowledge of std::string
internals, you could craft one for us. I would appreciate that.

You claim to have an ancient piece of code where your idiom used to
perform better than "creating once and repeatedly assigning", but this
is not what I want.

Let us make things clear. I am not here claiming that this is always
the way to go. One must, for sure, try several solutions. However
trying every possibility is not usually feasible, and one should try
those which are more likely to work well. This is why it is worth to
know why your code behaved like that, if it could have been improved,
and such.

Elias Salomão Helou Neto.
 
J

James Kanze

[...]
Of course it is, but people here seems to religiously prefer T
t( someFunction() ) over someFunction( t ) even within tight
loops and for absolutely every class. So, are they dumb?

It's dumb to choose an idiom in function of supposed performance
issues, without having first determined that there is a need,
and then measured to ensure that changing the idiom improves
things. In the past, I've actually changes one or two functions
to use your idiom. Because the profiler said I had to.
Depending on the class, the function, etc., it can make a
difference.
But you measured, right?

I measured one particular case. I measured intentionally in
answer to a posting which was making a similar claim to yours;
that declaring the variable outside the loop was faster.

In the past, I've profiled once or twice when the code was too
slow. In at least one case, using your technique made a
significant improvement. In others, it made no change, or even
made things worse. Choosing the solution "up front" because it
"will be faster" is counter productive; most of the time, it
doesn't matter, and when it does, you don't know up front which
will be faster. (Obviously, there are exceptions. If you're
constantly writing similar applications, using the same classes,
and you've already had to fix two or three in the same way,
well, experience is there for us to learn from. But you still
have to be aware that any changes in the implementation of
anything could invalidate your experience, and be prepared to
remeasure---even in the simple case of a compiler upgrade.)
I was trying to understand the reasons for something that
seemed unlikely to happen, at least from my point of view.

Off hand, I don't know the reasons. I knew them at the time,
but I've forgotten them. I do know that the C++ object model is
not always trivial, and that even in the case of simple code in
a simple language, it's not usually possible to predict where
the bottlenecks will be in advance. (Again, with some
exceptions: if you know in advance that you'll have to process
several hundred thousand elements, or more, it seems a safe
guess that an O(n!) won't cut it, and that even an O(n^2) will
probably be a bottleneck. But I've very sceptical of people who
claim different k's for the same big O, without measuring.)
In fact,I am still not convinced, as I have never actually
seen an example in which "repeated construction/destruction
cycles" could not be replaced with performance gain by
"creating outside the loop and passing the object as
reference". There may be, I look forward to see, but have
never seen. Perhaps, with your knowledge of std::string
internals, you could craft one for us. I would appreciate
that.

It would depend on the exact implementation of basic_string, and
it's been some time since I last looked into it.
You claim to have an ancient piece of code where your idiom
used to perform better than "creating once and repeatedly
assigning", but this is not what I want.
Let us make things clear. I am not here claiming that this is
always the way to go. One must, for sure, try several
solutions. However trying every possibility is not usually
feasible, and one should try those which are more likely to
work well. This is why it is worth to know why your code
behaved like that, if it could have been improved, and such.

One must only start trying when there is a problem. In most
domains, that's very rarely---most applications don't deal with
extremely large sets of data in memory. For those that do, you
can't really make any assumptions before profiling. Except
those concerning big-O---if the data set is large enough, an
O(n^2) implementation will be measurably slower than an O(n)
one, regardless of any other implementation details. But even
then... it takes a fair amount of data for the difference
between O(n) and O(n lg n) to become significant. And you'd be
surprised at how fast just copying can be on some machines. Off
hand, I'd expect returning something like an
std::vector<double>( 1000000 ) to be expensive, but in many
cases, the difference is significantly less than the time it
will take to generate the data anyhow.
 
?

=?iso-8859-1?q?Elias_Salom=E3o_Helou_Neto?=

[...]
Of course it is, but people here seems to religiously prefer T
t( someFunction() ) over someFunction( t ) even within tight
loops and for absolutely every class. So, are they dumb?

It's dumb to choose an idiom in function of supposed performance
issues, without having first determined that there is a need,
and then measured to ensure that changing the idiom improves
things. In the past, I've actually changes one or two functions
to use your idiom. Because the profiler said I had to.
Depending on the class, the function, etc., it can make a
difference.

Alright, you have made your point. However, as you said, unless the
profiler says the opposite, an idiom is mostly a matter of choice
(code readability is important, but whether some piece of code is more
readable than other is a matter of how used you are with each idiom).
It may be untruth in a few cases, but my own experience says that it
is worth sticking to the choice I have made.
I measured one particular case. I measured intentionally in
answer to a posting which was making a similar claim to yours;
that declaring the variable outside the loop was faster.

While the mentioned posting may have had a similar wording in its
claim, it was far from being the same. I am not only avoiding the
construction/destruction cycle, but also avoiding to return by value.
It is not only declaring outside the loop, but I see you've already
got it.

Still, my examples have shown that declaring outside and repeatedly
assigning provided a performance boost over declaring within the loop,
at least in that extremely simple case using std::string. If I had the
time I would investigate that further, for the difference is much
larger than it should be (likely because destruction releases memory
that will have to be reallocated, while assignment, in this case where
the string never grows, not).

It is worth repeating, I am not saying that declaring outside should
be preferred over copy-construction/destruction. In fact, my feeling
is that they should be nearly the same!
In the past, I've profiled once or twice when the code was too
slow. In at least one case, using your technique made a
significant improvement. In others, it made no change, or even
made things worse. Choosing the solution "up front" because it
"will be faster" is counter productive; most of the time, it
doesn't matter, and when it does, you don't know up front which
will be faster. (Obviously, there are exceptions. If you're
constantly writing similar applications, using the same classes,
and you've already had to fix two or three in the same way,
well, experience is there for us to learn from. But you still
have to be aware that any changes in the implementation of
anything could invalidate your experience, and be prepared to
remeasure---even in the simple case of a compiler upgrade.)

I guess we should say that there is no "simple code performance
question" after all. But it is also counter productive to choose a
style based solely on code readability, being this a secondary issue
in some domains, AND to believe that this idiom cannot degrade
performance (this is obviously not your case, but there are many who
do think this way).
Off hand, I don't know the reasons. I knew them at the time,
but I've forgotten them. I do know that the C++ object model is
not always trivial, and that even in the case of simple code in
a simple language, it's not usually possible to predict where
the bottlenecks will be in advance. (Again, with some
exceptions: if you know in advance that you'll have to process
several hundred thousand elements, or more, it seems a safe
guess that an O(n!) won't cut it, and that even an O(n^2) will
probably be a bottleneck. But I've very sceptical of people who
claim different k's for the same big O, without measuring.)

Our point here does not even regard operation count, it is more like
implementation details for some already chosen algorithm.
It would depend on the exact implementation of basic_string, and
it's been some time since I last looked into it.
Hum...


One must only start trying when there is a problem. In most
domains, that's very rarely---most applications don't deal with
extremely large sets of data in memory. For those that do, you
can't really make any assumptions before profiling. Except
those concerning big-O---if the data set is large enough, an
O(n^2) implementation will be measurably slower than an O(n)
one, regardless of any other implementation details. But even
then... it takes a fair amount of data for the difference
between O(n) and O(n lg n) to become significant. And you'd be
surprised at how fast just copying can be on some machines. Off
hand, I'd expect returning something like an
std::vector<double>( 1000000 ) to be expensive, but in many
cases, the difference is significantly less than the time it
will take to generate the data anyhow.

Well, at least in my problem domain, implementation details do matter.
The algorithms are out there (even those you invented, it is likely
that you will want to make public through a paper); once you have
chosen one, either based on its convergence rate, reconstruction
quality, numerical stability or any other reasons, you have to
actually provide a sound implementation. This, for sure, means heavy
experimentation, but it also precludes any unnecessary copying of the
data, whether it is 100 or 1000000 elements; even before profiling I
am quite sure I should avoid that.

In many cases, however, there is no need for extreme performance and
such attention to details would make the code unnecessarily hard to
read and lead to an extra time budget, which is not always available.
This is, by the way, the reason for many to advocate Java, Python,
etc. over C++ for such applications.

I have chosen C++ because, even if sometimes sacrificing code
readability and having to pay lots of attention to the details, code
written in C++ can be made as fast as "pure" C code, but with much
more expressive power.

All said, I guess we agree in most points and the main one is:

"There is no simple code performance question, so always ask to the
profiler."

Though experience will be handy when looking for solutions.

Elias Salomão Helou Neto
 
I

Ioannis Gyftos

According to you, Program 1 should run faster, right? But it is just
the opposite. Compiling both with no optimization (the default) using
gcc 4.1.2 20070502 Program 1 takes around 21 seconds to run against
around 15 seconds for Program 2. Now, let us turn optimization to its
higher level and see what happens. With the -O3 flag used when
compiling, Program 1's execution time falls to around 19 seconds,
while Program 2 goes down to amazing 12 seconds! Can you explain me
that?

I ran the same test as you did and I can confirm your result. I didn't
bother without optimization in any case.

However, since std::string::append did strike me as a bit unorthodox
(appending is the first you thought of?), i took the liberty to run
the same tests with:
str = "supercalifragili...";
instead of appending. With optimization -O3 on gcc 4.2.1, program 1
ran at 26.5 sec. Program 2 ran at 28.0 sec. If I remove the
"str.clear();" line in Program 2, I get 26.4sec.

(Yeah slow PC)

Can you explain me that? :)

Hint hint: Program 2 returns a reference (to an increasingly large
std::string), so the two programs are not fair imo.

On another similar note. Which one would you _prefer_?

for (std::list<...>::iterator i = myList.begin();
i != myList.end(); ++i)
{}

versus

std::list<...>::iterator i = myList.begin();
std::list<...>::iterator j = myList.end();
for ( ; i != j ; ++i)
{}
 
V

Victor Bazarov

Ioannis said:
[..]
On another similar note. Which one would you _prefer_?

for (std::list<...>::iterator i = myList.begin();
i != myList.end(); ++i)
{}

versus

std::list<...>::iterator i = myList.begin();
std::list<...>::iterator j = myList.end();
for ( ; i != j ; ++i)
{}

Actually, I probably would prefer the former (for a list, of
course), since there is no difference, AFAIK (unless profiling
tells me otherwise).

V
 
J

Joe Greer

On another similar note. Which one would you _prefer_?

for (std::list<...>::iterator i = myList.begin();
i != myList.end(); ++i)
{}

versus

std::list<...>::iterator i = myList.begin();
std::list<...>::iterator j = myList.end();
for ( ; i != j ; ++i)
{}

I've used:

for (std::list<...>::iterater i = myList.begin(), iEnd = myList.end();
i != iEnd; ++i)
{}

in the past. Gives the best of both worlds. However, one should recognize that
this is an optimisation that will make very little difference in most cases, but
somehow calling a function the minimum number of times necessary satifsfies an inner
urge of mine. :)

joe
 
V

Victor Bazarov

Joe said:
[..] somehow calling a function the minimum
number of times necessary satifsfies an inner urge of mine. :)

:)

It's an inherent mistrust between a human and a machine. In most
cases the compiler should replace the function call with the value
it returns (hopefully just a null pointer). If you trust your
compiler, "e = lst.end(); i < e;" and "; i < lst.end();" should be
the same to you.

V
 
J

James Kanze

Joe said:
[..] somehow calling a function the minimum
number of times necessary satifsfies an inner urge of mine. :)

It's an inherent mistrust between a human and a machine. In most
cases the compiler should replace the function call with the value
it returns (hopefully just a null pointer). If you trust your
compiler, "e = lst.end(); i < e;" and "; i < lst.end();" should be
the same to you.

Interestingly enough, the benchmarks I've run suggest that they
don't. Moving the call to end() out of the loop does speed it
up, at least with g++ on Sun Sparc.

Of course, the difference is very, very small, so unless you're
doing almost nothing in the loop, there's no point in
programming to it. I normally use the comparison it != c.end()
everywhere, and it's yet to cause a real performance problem.
 
J

Joe Greer

Joe said:
[..] somehow calling a function the minimum
number of times necessary satifsfies an inner urge of mine. :)

:)

It's an inherent mistrust between a human and a machine. In most
cases the compiler should replace the function call with the value
it returns (hopefully just a null pointer). If you trust your
compiler, "e = lst.end(); i < e;" and "; i < lst.end();" should be
the same to you.

Sadly, it's experience that causes this mistrust. :) Since the form I use
is pretty much idiomatic to me and is no more expensive, I tend to use it.

joe
 
B

Bo Persson

Elias Salomão Helou Neto wrote:
:
: Well, again you forget that your idiom has an implied destruction of
: the object at every loop iteration, resulting in the need to deal
: with exactly the same problem! How could that be different?
:
: I will give you an example. Take the following two simple programs:
:
: //Program 1:
: #include <string>
:
: std::string myFunction()
: {
: std::string str;
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
:
: return( str );
: }
:
: int main()
: {
: for( unsigned i( 0 ); i < 100000; ++i )
: std::string str( myFunction() );
:
: return( 0 );
: }
:
: //Program 2:
: #include <string>
:
: void myFunction( std::string& str )
: {
: str.clear();
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
: }
:
: int main()
: {
: std::string str;
: for( unsigned i( 0 ); i < 100000; ++i )
: myFunction( str );
:
: return( 0 );
: }
:
: According to you, Program 1 should run faster, right? But it is just
: the opposite. Compiling both with no optimization (the default)
: using gcc 4.1.2 20070502 Program 1 takes around 21 seconds to run
: against around 15 seconds for Program 2. Now, let us turn
: optimization to its higher level and see what happens. With the -O3
: flag used when compiling, Program 1's execution time falls to
: around 19 seconds, while Program 2 goes down to amazing 12 seconds!
: Can you explain me that?

Yes, you are benchmarking the memory allocation for std::string.

On my machine, using another compiler, I get:

Program 1: 22.5 s
Program 2: 3.3 s

Then I notice that Program 2 reuses the same internal string buffer
for all calls, saving calls to the string growth code for the last
99,999 calls. To even the score a bit, I add a "str.reserve(100000)"
to myFunction.

Program 1B: 3.5 s
Program 2B: 3.4 s


: It's time for another listing:
:
: //Program 3:
: #include <string>
:
: std::string myFunction()
: {
: std::string str;
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
:
: return( str );
: }
:
: int main()
: {
: std::string str;
: for( unsigned i( 0 ); i < 100000; ++i )
: str = myFunction();
:
: return( 0 );
: }
:
: Program 3 takes little more than 17 seconds to run without
: optimization turned on, explain it to me, please. When optimized, it
: will take around 15 seconds to run.

On my machine it takes 24 s unmodified.

Adding the same "str.reserve(100000)" to myFunction.

Program 3B: 5.6 s

Rewriting main, making it equivalent to Program 1:

int main()
{

for( unsigned i( 0 ); i < 100000; ++i )
std::string str = myFunction();

return( 0 );
}

Program 3C: 3.5 s


The last case shows that, in this test, constructing a new string on
each iteration is faster than assigning a new value to an existing
string.


Bo Persson
 
?

=?iso-8859-1?q?Elias_Salom=E3o_Helou_Neto?=

I ran the same test as you did and I can confirm your result. I didn't
bother without optimization in any case.

However, since std::string::append did strike me as a bit unorthodox
(appending is the first you thought of?), i took the liberty to run

Yes, because I wanted a huge string, not only to mimic assignment.
Without delving in obscure std::string methods, how do you create a
huge std::string without appending content to it?
the same tests with:
str = "supercalifragili...";
instead of appending. With optimization -O3 on gcc 4.2.1, program 1
ran at 26.5 sec. Program 2 ran at 28.0 sec. If I remove the
"str.clear();" line in Program 2, I get 26.4sec.
(Yeah slow PC)

Can you explain me that? :)

Well, of course clearing the string was not necessary in your case.
The explanation in my example was memory management, as you can easily
deduce from the 3.7 seconds of system time required for Program 1 to
run to completion, compared to 0.3 for Program 2. Under Linux, the
time command gives you this info. Now, in your case both versions had
the same execution time, so there should be no preferred way, and I
will stick to mine. Also, being everything the same, what do you want
me to explain?
Hint hint: Program 2 returns a reference (to an increasingly large
std::string), so the two programs are not fair imo.

Neither the program returns any reference nor the void myFunction
does, I don't see what you mean.
On another similar note. Which one would you _prefer_?

for (std::list<...>::iterator i = myList.begin();
i != myList.end(); ++i)
{}

versus

std::list<...>::iterator i = myList.begin();
std::list<...>::iterator j = myList.end();
for ( ; i != j ; ++i)
{}

The former. As Bazarov has already noticed, both should be equivalent
unless the compiler does any calculation to arrive at myList.end(),
which is unlikely.

On the other hand, it could be interesting to avoid things like:

for ( std::vector<...>::size_type i( 0 ); i < myVector.size(); ++i );

because the size() member is likely to end up being inline expanded to
myVector.end() - myVector.begin(), so saving it in a variable could be
a good idea in performance sensitive applications. In such cases I do
write:

std::vector<...>:size_type size( myVector.size() );
for ( std::vector<...>::size_type i( 0 ); i < size; ++i );

Notice that the second _cannot_ be much slower than the former (it
will be at most one integer type creation slower), but can potentially
avoid thousands of recalculations. When you come into a loop that only
executes one subtraction within each iteration (and it does happen a
lot to me), the former will be doubling the execution time, unless if
optimized by the compiler (which would result in code similar to
mine), but such optimization may not be an easy task for it.

Of course using iterators here would be much more elegant, but some
algorithms are best implemented with indexes. It is much easier to
provide indexes when working with matrices representing images than to
work with iterators. By the way, why there is no std::matrix? It makes
me unhappy.

Perhaps I am neurotic about performance, but I do need to be. That's
why I chose C++ over Python. If one is to be writing sloppy C++ code,
why not move to Python?

Elias Salomão Helou Neto
 
?

=?iso-8859-1?q?Elias_Salom=E3o_Helou_Neto?=

Elias Salomão Helou Neto wrote:
:
: Well, again you forget that your idiom has an implied destruction of
: the object at every loop iteration, resulting in the need to deal
: with exactly the same problem! How could that be different?
:
: I will give you an example. Take the following two simple programs:
:
: //Program 1:
: #include <string>
:
: std::string myFunction()
: {
: std::string str;
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
:
: return( str );
: }
:
: int main()
: {
: for( unsigned i( 0 ); i < 100000; ++i )
: std::string str( myFunction() );
:
: return( 0 );
: }
:
: //Program 2:
: #include <string>
:
: void myFunction( std::string& str )
: {
: str.clear();
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
: }
:
: int main()
: {
: std::string str;
: for( unsigned i( 0 ); i < 100000; ++i )
: myFunction( str );
:
: return( 0 );
: }
:
: According to you, Program 1 should run faster, right? But it is just
: the opposite. Compiling both with no optimization (the default)
: using gcc 4.1.2 20070502 Program 1 takes around 21 seconds to run
: against around 15 seconds for Program 2. Now, let us turn
: optimization to its higher level and see what happens. With the -O3
: flag used when compiling, Program 1's execution time falls to
: around 19 seconds, while Program 2 goes down to amazing 12 seconds!
: Can you explain me that?

Yes, you are benchmarking the memory allocation for std::string.

Well, it is in fact easier to deal with memory allocation once than
doing it in every loop iteration. But, as I said, my example is
contrived.
On my machine, using another compiler, I get:

Program 1: 22.5 s
Program 2: 3.3 s

Then I notice that Program 2 reuses the same internal string buffer
for all calls, saving calls to the string growth code for the last
99,999 calls.

It happens all the time with this idiom.
To even the score a bit, I add a "str.reserve(100000)"
to myFunction.

Program 1B: 3.5 s
Program 2B: 3.4 s

Assuming also that reserving much more memory than needed is not a
problem, yes, it should work, but 2 is still (marginally) faster, it
would be fairer to say as fast as. It is yet to appear someone to show
an opposite example, i.e., where passing an object as reference will
degrade performance (although some claim that it is possible, and I do
believe).

I can imagine extremely contrived examples involving somewhat absurd
classes, but never when the class to which the object belongs allows
efficient manipulation of the data. If std::string did not allow such
manipulations it would be useless, since char[] already existed in C.
In fact, if any class does not provide other means to manipulate its
data than through constructors, why to exist at all if we could have
done well with a C struct? This seems to apply even more to classes
whose instances are supposed to hold large amounts of data.
: It's time for another listing:
:
: //Program 3:
: #include <string>
:
: std::string myFunction()
: {
: std::string str;
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
:
: return( str );
: }
:
: int main()
: {
: std::string str;
: for( unsigned i( 0 ); i < 100000; ++i )
: str = myFunction();
:
: return( 0 );
: }
:
: Program 3 takes little more than 17 seconds to run without
: optimization turned on, explain it to me, please. When optimized, it
: will take around 15 seconds to run.

On my machine it takes 24 s unmodified.
Adding the same "str.reserve(100000)" to myFunction.
Program 3B: 5.6 s

I guess there is no copy on write on your compiler's std::string
implementation, so that assignment to a temporary will actually move
data around (whether this is a good design decision or not, I do not
know), but this would not be needed with your idiom because the
standard allows to optimize away the copy constructor (I am willing to
bet that if you forbid optimization both will be equivalent). Compiled
with gcc, all of your versions run equally fast on my machine
(actually equally slow when compared to your machine) whether
optimized or not. Now I really want to know which compiler you are
using.
Rewriting main, making it equivalent to Program 1:

int main()
{

for( unsigned i( 0 ); i < 100000; ++i )
std::string str = myFunction();

return( 0 );
}

Program 3C: 3.5 s

The last case shows that, in this test, constructing a new string on
each iteration is faster than assigning a new value to an existing
string.

This is just the same than std::string str( myFunction() ). We did not
even needed this case to reach the conclusion, but the dramatic effect
is interesting. Are you a lawyer? Just kidding...

Well it is for your compiler, but what I would really love to know is
why is your idiom so overhauled that no one can realize that passing
the string as a reference (within tight loops, of course) is much less
likely to suffer from performance penalties?

Also, try comparing 1B against 3B forbidding optimization to see what
an non-optimizing compiler may be doing with your idiom. Please, do it
or say which compiler you are using. I am curious.

I argue that, when not optimized, 1B should be equivalent to 3B in
every realistic implementation of std::string. With optimization, 1B
should perform better on some implementations. But for really good
implementations (recent versions of gcc), both should be nearly the
same even with optimization turned on. The conclusion is that 1B has
more chances of being successful, so should be preferred over 3B. But
we can go further and say that 2B is much more likely to beat both in
most cases.

Elias Salomão Helou Neto
 
B

Bo Persson

Elias Salomão Helou Neto wrote:
:: Elias Salomão Helou Neto wrote:
:::
::: According to you, Program 1 should run faster, right? But it is
::: just the opposite. Compiling both with no optimization (the
::: default) using gcc 4.1.2 20070502 Program 1 takes around 21
::: seconds to run against around 15 seconds for Program 2. Now, let
::: us turn optimization to its higher level and see what happens.
::: With the -O3 flag used when compiling, Program 1's execution time
::: falls to around 19 seconds, while Program 2 goes down to amazing
::: 12 seconds! Can you explain me that?
::
:: Yes, you are benchmarking the memory allocation for std::string.
:
: Well, it is in fact easier to deal with memory allocation once than
: doing it in every loop iteration. But, as I said, my example is
: contrived.
:
:: On my machine, using another compiler, I get:
::
:: Program 1: 22.5 s
:: Program 2: 3.3 s
::
:: Then I notice that Program 2 reuses the same internal string buffer
:: for all calls, saving calls to the string growth code for the last
:: 99,999 calls.
:
: It happens all the time with this idiom.

The benefit is exaggerated by teh fact that the string is the same
size for every call. Otherwise there would be reallocations here too.

:
:: To even the score a bit, I add a "str.reserve(100000)"
:: to myFunction.
::
:: Program 1B: 3.5 s
:: Program 2B: 3.4 s
:
: Assuming also that reserving much more memory than needed is not a
: problem, yes, it should work,

It's not *much* more memory that needed, I just allocate enough to
hold 1000 appends of about a 100 characters each. (74 is it, if
counting?)

: but 2 is still (marginally) faster, it
: would be fairer to say as fast as. It is yet to appear someone to
: show an opposite example, i.e., where passing an object as
: reference will degrade performance (although some claim that it is
: possible, and I do believe).

Being 0.1 s faster per 100,000 iterations is very marginally faster in
my book. :)

:
::: It's time for another listing:
:::
::: //Program 3:
::: #include <string>
:::
::: std::string myFunction()
::: {
::: std::string str;
::: for ( unsigned i( 0 ); i < 1000; ++i )
::: str.append( "supercalifragilisomethingidonotremebmberandd"
::: "donotwantotsearchintheinternet" );
:::
::: return( str );
::: }
:::
::: int main()
::: {
::: std::string str;
::: for( unsigned i( 0 ); i < 100000; ++i )
::: str = myFunction();
:::
::: return( 0 );
::: }
:::
::: Program 3 takes little more than 17 seconds to run without
::: optimization turned on, explain it to me, please. When optimized,
::: it will take around 15 seconds to run.
::
:: On my machine it takes 24 s unmodified.
:: Adding the same "str.reserve(100000)" to myFunction.
:: Program 3B: 5.6 s
:
: I guess there is no copy on write on your compiler's std::string

Right.

: implementation, so that assignment to a temporary will actually move
: data around (whether this is a good design decision or not, I do not
: know), but this would not be needed with your idiom because the
: standard allows to optimize away the copy constructor (I am willing
: to bet that if you forbid optimization both will be equivalent).

I don't find it very interesting to compare the speed of unoptimized
compiles. If I want the code to be fast, I use a good compiler with
appropriate settings. If I don't care (or need) the speed, it doesn't
really matter.

: Compiled with gcc, all of your versions run equally fast on my
: machine (actually equally slow when compared to your machine)
: whether optimized or not. Now I really want to know which compiler
: you are using.

It's the other free compiler, Visual C++ 2005 Express (using an
alternate version of the standard library).

:
: Well it is for your compiler, but what I would really love to know
: is why is your idiom so overhauled that no one can realize that
: passing the string as a reference (within tight loops, of course)
: is much less likely to suffer from performance penalties?

The argument was the other way around, that constructing a string
inside the loop was not killing performance.

:
: Also, try comparing 1B against 3B forbidding optimization to see
: what an non-optimizing compiler may be doing with your idiom.
: Please, do it or say which compiler you are using. I am curious.

Ok, without optimization (debug build) we get

Program 1B: 95 s
Program 2B: 87 s
Prorgam 3B: 96 s

From earlier experiments I believe that the main effect here is from
disabled inlining. Actually having to call a lot of accessor functions
out-of-line, seems to cost between 10 and 100 times as much in my
code.


Bo Persson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,197
Messages
2,571,040
Members
47,634
Latest member
RonnyBoelk

Latest Threads

Top