Q: Return value optimization thru several function calls?

D

Denis Remezov

Jakob said:
Hi!

I am using VC++ 7.1 and have a question about return value optimization.
Consider the following code:

#include <list>
#include <string>

struct test
{
std::list <std::string> m_strings;
};

test func1 (int r)
{
if (r == 0)
return test ();

test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;
}

int main ()
{
test t = func1 (3);
}

As Jeff said, in the above example the compiler is allowed to optimise
away unnecessary locals (in func1()), temporaries and the corresponding
copy constructor calls. If func1() were defined like this:

test func1 (int r)
{
test t;
t.m_strings.push_back ("lala");
return t;
}

with the same main(), I would definitely expect just one construction and
destruction from a decent compiler. The recursion, however, may complicate
things a bit. You can add a couple of ctors to test (default and copy) and
a dtor that cout a message and see.

Consider this too:
int main()
{
test t;
//do something to t ...

t = func1 (3);
}

Not much luck in this case: t will be constructed first, then at least one
object of class test will be created as a result of func1(), then operator =
will be called.

When I care about performance, I usually define func1() as
void func1 (int r, test& t) {...}
If the functional notation were important, I would consider using proxies
(and accomodate the "actual" objects acordingly), but I would not return
really big objects from a function.

Denis
 
J

Jakob Bieling

Hi!

I am using VC++ 7.1 and have a question about return value optimization.
Consider the following code:

#include <list>
#include <string>

struct test
{
std::list <std::string> m_strings;
};

test func1 (int r)
{
if (r == 0)
return test ();

test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;
}

int main ()
{
test t = func1 (3);
}

Now, stepping thru the code, one would assume that 'test' objects are
copy constructed and destroyed often. My question is, would a decent
compiler, like VC++ 7.1, be able to optimize this in such a way that only a
single 'test' object is used throughout the code? To make this post more
on-topic, would the Standard even allow such an optimization, as it is a
pretty great change to the original code ..

Thanks in advance!
 
J

Jeff Schwab

Jakob said:
Hi!

I am using VC++ 7.1 and have a question about return value optimization.
Consider the following code:

#include <list>
#include <string>

struct test
{
std::list <std::string> m_strings;
};

test func1 (int r)
{
if (r == 0)
return test ();

test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;
}

int main ()
{
test t = func1 (3);
}

Now, stepping thru the code, one would assume that 'test' objects are
copy constructed and destroyed often. My question is, would a decent
compiler, like VC++ 7.1, be able to optimize this in such a way that only a
single 'test' object is used throughout the code? To make this post more
on-topic, would the Standard even allow such an optimization, as it is a
pretty great change to the original code ..

The standard allows the optimization (6.3.3).
 
D

Denis Remezov

Jakob said:
Well, it is not important, but I do care about design and the above
seems like a rather ugly solution;
[snip]

It depends. In some respect, the syntax is inferior, I agree; func1() now
also needs to be careful to handle or dispose of the existing contents
of t properly (that could be a bad thing). Potential advantages are
(performance issues aside):
0) Class test does not need to be constructible in the given scope.
1) It is further guaranteed and made explicit that no construction
or copying of objects of class test will take place.

It is not that the object is big, but it contains a linked list of
non-POD objects,
[snip]

Sorry I didn't make it clear, but I meant to include that into the
definition of a "big object".
I assume the proxy would take care of doing a fast swap rather than a
deep-copy of the objects, correct?

Something of the kind. I was thinking about proxy objects sharing pointers
to the "real" objects and reference-counting them, deferring the
actual creation and copying until such point in time that a write-operation
is requested.

Denis
 
J

Jakob Bieling

The standard allows the optimization (6.3.3).

In my copy, 6.3 is "Compound statement or block" and only has a single
paragraph. IOW, there is no 6.3.3?

regards
 
J

Jakob Bieling

When I care about performance, I usually define func1() as
void func1 (int r, test& t) {...}
If the functional notation were important,

Well, it is not important, but I do care about design and the above
seems like a rather ugly solution; tho it would serve the purpose pretty
well.
I would consider using proxies
(and accomodate the "actual" objects acordingly), but I would not return
really big objects from a function.

It is not that the object is big, but it contains a linked list of
non-POD objects, so creating copies of it where it is not necessary, seems
like a bad idea.

I assume the proxy would take care of doing a fast swap rather than a
deep-copy of the objects, correct?

Thanks!
 
V

Victor Bazarov

Jakob Bieling said:
In my copy, 6.3 is "Compound statement or block" and only has a single
paragraph. IOW, there is no 6.3.3?

Jeff probably meant 6.6.3 "The return statement".

V
 
P

Petec

Jakob said:
In my copy, 6.3 is "Compound statement or block" and only has a
single paragraph. IOW, there is no 6.3.3?

regards

It's not in mine (2003 revision) either...

- Pete
 
J

Jakob Bieling

Victor Bazarov said:
Jeff probably meant 6.6.3 "The return statement".


Thanks .. there is a reference to 12.2 which explicitly allows the
optimization of constructing the return value directly in place of the
variable that is used to hold the return value, thus eliminating the
temporary copy.

Guess I will go for Dennis' proxy-class approach, since it does not look
like a decent compiler will do the optimization I am after.

Thanks for the comments!
 
D

David Harmon

On Sun, 18 Apr 2004 19:38:26 +0200 in comp.lang.c++, "Jakob Bieling"
Guess I will go for Dennis' proxy-class approach, since it does not look
like a decent compiler will do the optimization I am after.

A decent compiler will certainly do that optimization!
Now the question remains, whether there is one yet that does? Perhaps
not.
 
J

Jakob Bieling

David Harmon said:
On Sun, 18 Apr 2004 19:38:26 +0200 in comp.lang.c++, "Jakob Bieling"


A decent compiler will certainly do that optimization!
Now the question remains, whether there is one yet that does? Perhaps
not.


;)
 
D

David Harmon

On Sun, 18 Apr 2004 18:30:47 +0200 in comp.lang.c++, "Jakob Bieling"
Well, it is not important, but I do care about design and the above
seems like a rather ugly solution; tho it would serve the purpose pretty
well.

I am sure your example is simplified and that there would be other
concerns in a complex application. Nevertheless, for the purpose of the
simple example I would consider constructing a single object and using a
member function with the implied 'this' to make it perhaps a bit less
ugly than the explicit reference argument. In my view, this gets closer
to the implementation originally desired, albeit at the cost of a member
function that you might prefer as a non-menber.

#include <iostream>
using std::cerr;
#include <list>
#include <string>

struct test {
std::list<std::string> m_strings;
test & func1(int r);

// instrumentation:
test() { cerr << "Construct!\n"; }
test(const test & x)
: m_strings(x.m_strings)
{ cerr << "Copy construct " << m_strings.size() << "!\n"; }
~test() { cerr << "Destruct " << m_strings.size() << "!\n"; }
//
};

test & test::func1(int r)
{
if (r == 0)
return *this;

func1(r - 1);
m_strings.push_back("lala");
return *this;
}

int main()
{
test t;
t.func1(3);
}
 
T

tom_usenet

Hi!

I am using VC++ 7.1 and have a question about return value optimization.
Consider the following code:

#include <list>
#include <string>

struct test
{
std::list <std::string> m_strings;
};

test func1 (int r)
{
if (r == 0)
return test ();

The line above disables the optimization for most compilers.
Generally, for NRVO to work, all return statements must return the
same object (although this isn't a requirement of 12.8/15, just common
practice).
test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;

So the above probably won't be optimized.
}

int main ()
{
test t = func1 (3);
}

Now, stepping thru the code, one would assume that 'test' objects are
copy constructed and destroyed often. My question is, would a decent
compiler, like VC++ 7.1, be able to optimize this in such a way that only a
single 'test' object is used throughout the code? To make this post more
on-topic, would the Standard even allow such an optimization, as it is a
pretty great change to the original code ..

Both GCC and VC7.1 failed to optimize the original code (I added
instrumentation). Here is a version that on GCC 3.3 prints:

Default
Destroy

which is what you were after. VC++ 7.1 doesn't implement NRVO, so that
prints:

Default
Copy
Destroy
Copy
Destroy
Copy
Destroy
Copy
Destroy
Copy
Destroy
Copy
Destroy
Copy
Destroy
Copy
Destroy
Destroy

Oh dear! Here's the code:

#include <list>
#include <string>
#include <iostream>
using namespace std;

struct test
{
test()
{
cout << "Default\n";
}

test(test const& t)
:m_strings(t.m_strings)
{
cout << "Copy\n";
}

~test()
{
cout << "Destroy\n";
}

test& operator=(test const& t)
{
m_strings = t.m_strings;
cout << "Assign\n";
return *this;
}

std::list <std::string> m_strings;
};

test func1 (int r)
{
test t(r == 0? test() : func1 (r - 1));
if (r != 0)
t.m_strings.push_back ("lala");
return t;
}

int main ()
{
test t = func1 (3);
}

And what's the moral of this? Well, to enable NRVO, make sure you:

1. Ensure all return statements return the same local object.
2. Avoid assignments.
3. Use GCC.

I assume MS will implement NRVO eventually...

Tom
 
E

E. Robert Tisdale

tom_usenet said:
Oh dear! Here's the code:
cat main.cc
#include <list>
#include <string>
#include <iostream>

struct test {
test(void) {
std::cout << "Default" << std::endl;
}

test(test const& t):
m_strings(t.m_strings) {
std::cout << "Copy" << std::endl;
}

~test(void) {
std::cout << "Destroy" << std::endl;
}

test& operator=(test const& t) {
m_strings = t.m_strings;
std::cout << "Assign" << std::endl;
return *this;
}

std::list <std::string> m_strings;
};

test func1(int r) {
test t(r == 0? test(): func1(r - 1));
if (r != 0)
t.m_strings.push_back("lala");
return t;
}

int main (int argc, char* argv[]) {
test t = func1(3);
return 0;
}
g++ -Wall -ansi -pedantic -O2 -o main main.cc
./main
Default
Destroy
 
E

E. Robert Tisdale

David said:
I am sure your example is simplified and that
there would be other concerns in a complex application.
Nevertheless, for the purpose of the simple example
I would consider constructing a single object
and using a member function with the implied 'this'

This is *very* bad advice.
to make it perhaps a bit less ugly
than the explicit reference argument. In my view, this gets closer
to the implementation originally desired, albeit at the cost of a member
function that you might prefer as a non-menber.

#include <iostream>
// using std::cerr;
#include <list>
#include <string>

struct test {

using std::cerr;
std::list<std::string> m_strings;
test & func1(int r);

// instrumentation:
test() { cerr << "Construct!\n"; }
test(const test & x):
m_strings(x.m_strings) {
cerr << "Copy construct " << m_strings.size() << "!\n"; }
~test() { cerr << "Destruct " << m_strings.size() << "!\n"; }
//
};

test& test::func1(int r) {

if (0 == r)
return *this;

func1(r - 1);
m_strings.push_back("lala");
return *this;
}

int main() {
test t;
t.func1(3);

return 0;
 
E

E. Robert Tisdale

David said:
A decent compiler will certainly do that optimization!
Now the question remains, whether there is one yet that does?
Perhaps not.

My GNU C++ compiler:
g++ --version
g++ (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)

implements the NRVO. So do several other quality C++ compilers.
If your compiler does not implement the NVRO,
you should be shopping for a better compiler.
 
S

Siemel Naran

tom_usenet said:
The line above disables the optimization for most compilers.
Generally, for NRVO to work, all return statements must return the
same object (although this isn't a requirement of 12.8/15, just common
practice).

Compilers should allow NVRO if the different return statements return
objects declared after the previous return statement.


Maybe explicit braces will help, but I'm not sure:

test func1 (int r)
{
if (r == 0)
return test ();

/*else*/ {
test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;
}
}


Or even mutual recursion

test func1 (int r)
{
return r == 0 ? test () : private_func1_assumes_r_greater_than_zero(r);
}

test private_func1_assumes_r_greater_than_zero (int r)
{
test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;
}
 
D

David Harmon

I am sure your example is simplified and that
there would be other concerns in a complex application.
Nevertheless, for the purpose of the simple example
I would consider constructing a single object
and using a member function with the implied 'this'

This is *very* bad advice.[/QUOTE]

Then what is so bad about it? It works as intended. I see you made
some useless and irrelevant changes to my suggested code, but nothing of
substance. So how would you code it to squeeze the desired result out
of a compiler that does not include NRVO?
 
T

tom_usenet

Compilers should allow NVRO if the different return statements return
objects declared after the previous return statement.

They don't in general though. NRVO is generally implemented by passing
a hidden parameter for a pointer to some storage in which the return
value should be constructed, but compilers generally only bother if
there is only one object to return, since in the general case it is
very hard to do otherwise. e.g.

test f()
{
test t(5);
if (today == TUESDAY)
return test(10);
else
return t;
}

The compiler doesn't know whether to construct t in the return storage
space until runtime, so it doesn't bother, and just copies u or v into
the return value without optimizing it.
Maybe explicit braces will help, but I'm not sure:

test func1 (int r)
{
if (r == 0)
return test ();

/*else*/ {
test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;
}
}

That doesn't do it - the braces make no difference.

Or even mutual recursion

test func1 (int r)
{
return r == 0 ? test () : private_func1_assumes_r_greater_than_zero(r);
}

test private_func1_assumes_r_greater_than_zero (int r)
{
test t = func1 (r - 1);
t.m_strings.push_back ("lala");
return t;
}

That does do it, since it follows the rules I posted in my previous
post (if a named local is returned, all returns return the same one).
Still, simpler is the version I posted:

test func1 (int r)
{
test t(r == 0? test() : func1 (r - 1));
if (r != 0)
t.m_strings.push_back ("lala");
return t;
}

also does the job.

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top