I love it when people comment about things they know nothing
about. We did the measures with a number of different
compilers: VC++, g++, Intel and Sun CC. In all cases, passing
the pre-constructed vector to the function was significantly
faster than returning a vector.
When you do it manually instead of RVO you have an extra
default construct construction and an extra swap, but it
shouldn't really matter for efficiency.
The compiler can't always use RVO. Our two versions were:
std::vector<double> v(...);
for (... lot's of iterations ... )
{
// calculate some values based on the current contents
// of v.
v = some_function(... the calculated values ...);
// ...
}
as opposed to
std::vector<double> v(...);
for (... lot's of iterations ... )
{
// calculate some values based on the current contents
// of v.
some_function(&v, ... the calculated values ...);
// ...
}
(As far as I can tell, this is a more or less standard procedure
in numerical analysis. Although in some cases, you might have
two vectors, one with the old values, and one in which you put
the new, swapping them each time you go through the loop.)