Why pass doubles as const references?

A

army1987

Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?
 
I

Ian Collins

army1987 said:
Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?

On most current systems, I would expect the performance to decrease
(building the reference) rather than increase passing a double by const
reference.
 
R

Rui Maciel

army1987 said:
Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?

In architectures where pointers are 64-bit wide there is no point in passing
primitives as const references. Doing so even introduces a performance
penalty, because it is just as expensive to pass a double as is a pointer,
and a reference implies a redirection.

If I'm not mistaken, this point is covered in one of Scott Meyer's effective
C++ books.


Rui Maciel
 
V

Victor Bazarov

Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?

Perhaps you're missing the age of the code in question. Fifteen years
ago passing a double by a const reference would have a noticeable
difference to passing by value. Not anymore, most likely.

V
 
Ö

Öö Tiib

Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`?

There can be good reasons. For example if it is one of overloads and
overloads accept const& for several class types plus double. Making it
different from other overloads may (or may not) cause subtle difficulties
of usage (say picking pointer to that overload) from template.
I can see the point of that when passing a very large object, but with a
double I'd expect any improvement in performance to be negligible.

Most likely it does not affect performance at all either way. Both
ways you can pass billions of parameters per second. If it is complex
algorithm then performance of parameter passing does not affect overall
performance by any percentage. If it is trivial algorithm then it is
often inlined and so parameter's won't be passed.
I've seen code using the former, but I guess that's because it was
translated from Fortran, where all function arguments are passed by
reference -- or am I missing something?

That can be other good reason. Most code generators/translators
produce such code (in circumstances) that contains some overhead.
For example I have seen a switch with default only in generated code.
It looks nonsensical and feels waste, but in practice a compiler
later optimizes it out and so the perceptional "inefficiency" does
not manifest itself.
 
R

Rui Maciel

Öö Tiib said:
Most likely it does not affect performance at all either way. Both
ways you can pass billions of parameters per second. If it is complex
algorithm then performance of parameter passing does not affect overall
performance by any percentage. If it is trivial algorithm then it is
often inlined and so parameter's won't be passed.

<example>
rui@kubuntu:tmp$ cat main.c++
#include <ctime>
#include <iostream>

double count = 0;

void value(double foo)
{
count += foo;
}


void reference(double const &foo)
{
count += foo;
}


int main(void)
{
const int max = 100000000;
clock_t t = clock();

count = 0;
for(int i = 0; i < max; ++i)
{
value(1.0f);
}

std::cout << "time pass by value: " << clock() - t << std::endl;

t = clock();
count = 0;
for(int i = 0; i < max; ++i)
{
reference(1.0f);
}

std::cout << "time pass by reference: " << clock() - t << std::endl;

return 0;
}

rui@kubuntu:tmp$ g++ main.c++ && ./a.out
time pass by value: 640000
time pass by reference: 1670000
</example>


Rui Maciel
 
I

Ian Collins

Rui said:
<example>
rui@kubuntu:tmp$ cat main.c++
#include <ctime>
#include <iostream>

double count = 0;

void value(double foo)
{
count += foo;
}


void reference(double const &foo)
{
count += foo;
}


int main(void)
{
const int max = 100000000;
clock_t t = clock();

count = 0;
for(int i = 0; i < max; ++i)
{
value(1.0f);
}

std::cout << "time pass by value: " << clock() - t << std::endl;

t = clock();
count = 0;
for(int i = 0; i < max; ++i)
{
reference(1.0f);
}

std::cout << "time pass by reference: " << clock() - t << std::endl;

return 0;
}

rui@kubuntu:tmp$ g++ main.c++ && ./a.out
time pass by value: 640000
time pass by reference: 1670000

That's what I would have expected, however on a reasonable quick i7
(with an extra 0 in max):

32 bit:

g++ x.cc && ./a.out
time pass by value: 7510000
time pass by reference: 2700000

64 bit:

g++ x.cc -m64 && ./a.out
time pass by value: 2440000
time pass by reference: 2760000

With a little optimisation:

g++ x.cc -m64 -O1 && ./a.out
time pass by value: 2410000
time pass by reference: 2410000
 
Ö

Öö Tiib

Which completely optimizes out (eliminates) both function calls
(reference and value).

Nope, it inlines those. It can not optimize out summing into global with
external linkage so easily. What you think where those 2.4
seconds went? Inlining was what I predicted. Billion cycles took less
than 3 seconds unoptimized as well. That on only one core from quad of
i7. It is unlikely that any of it matters for performance of practical
application. Just acquiring meaningful billion doubles from any media
(including RAM) is far more expensive.
 
I

Ian Collins

Scott said:
Which completely optimizes out (eliminates) both function calls (reference and value).

So nothing takes 4.8 seconds to execute? The calls are still made, the
function bodies are optimised.

This is what happens when the function calls are optimised away:

CC x.cc -fast -m64 && ./a.out
time pass by value: 0
time pass by reference: 0

:)
 
I

Ian Collins

Scott said:
It optimizes them out. There is no 'CALL' instruction.

It does that by inlining the functions, so there is no function call.

I'm not so daft as to post something without checking first. The first
loop is:

call clock
movq $0, count(%rip)
movl $1000000000, %ebx
..L7:
movsd .LC1(%rip), %xmm0
call _Z5valued
subl $1, %ebx
jne .L7

The optimised value function is:

..globl _Z5valued
.type _Z5valued, @function
_Z5valued:
..LFB961:
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret

Unoptimised:

..globl _Z5valued
.type _Z5valued, @function
_Z5valued:
..LFB961:
pushq %rbp
..LCFI3:
movq %rsp, %rbp
..LCFI4:
movsd %xmm0, -8(%rbp)
movsd count(%rip), %xmm0
addsd -8(%rbp), %xmm0
movsd %xmm0, count(%rip)
leave
..LCFI5:
ret

Which looks like a typical x64 stack frame optimisation.
 
J

Jorgen Grahn

On most current systems, I would expect the performance to decrease
(building the reference) rather than increase passing a double by const
reference.

Wouldn't the expensive part be dealing with aliasing? E.g.

void foo(const double& bar) {
double baz = bar;
fred();
baz += bar;
...
}

can't just assume fred() doesn't modify bar, in the general case.

/Jorgen
 
R

Rui Maciel

Öö Tiib said:
Nope, it inlines those. It can not optimize out summing into global with
external linkage so easily. What you think where those 2.4
seconds went? Inlining was what I predicted. Billion cycles took less
than 3 seconds unoptimized as well. That on only one core from quad of
i7. It is unlikely that any of it matters for performance of practical
application. Just acquiring meaningful billion doubles from any media
(including RAM) is far more expensive.

You are assuming that a very specific corner case is somehow the rule, which
is a bad assumption to make. Just because a compiler can, as a corner case,
optimize away pure functions, it doesn't mean that all possible and
conceivable function calls will be optimized away. For instance, the corner
case you are counting on simply doesn't happen if the functions are a part
of a library.

<code>
rui@kubuntu:tmp$ cat main.c++

double count = 0;

void value(double foo)
{
count += foo;
}


void reference(double const &foo)
{
count += foo;
}
</code>

The following instructions are obtained with -O1, -O2, and -O3:

<snip>
Z5valued:
..LFB1006:
.cfi_startproc
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip
_Z9referenceRKd:
..LFB1007:
.cfi_startproc
movsd count(%rip), %xmm0
addsd (%rdi), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc
</snip>



Rui Maciel
 
R

Rui Maciel

Ian said:
That's what I would have expected, however on a reasonable quick i7
(with an extra 0 in max):

32 bit:

g++ x.cc && ./a.out
time pass by value: 7510000
time pass by reference: 2700000

64 bit:

g++ x.cc -m64 && ./a.out
time pass by value: 2440000
time pass by reference: 2760000

With a little optimisation:

g++ x.cc -m64 -O1 && ./a.out
time pass by value: 2410000
time pass by reference: 2410000

<example>
rui@kubuntu:tmp$ g++ -m64 -O1 main.c++ && ./a.out
time: 520000
time: 590000
</example>


Here's a dump of the relevant assembly bits:

<example>
rui@kubuntu:tmp$ g++ -m64 -O1 main.c++ -S
rui@kubuntu:tmp$ cat main.s

// snip

_Z5valued:
..LFB1006:
.cfi_startproc
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip

_Z9referenceRKd:
..LFB1007:
.cfi_startproc
movsd count(%rip), %xmm0
addsd (%rdi), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip
</example>

The extra instruction included in reference() represents the pointer
dereferencing which is expected from passing a parameter by reference.


Rui Maciel
 
I

Ian Collins

Rui said:
You are assuming that a very specific corner case is somehow the rule, which
is a bad assumption to make. Just because a compiler can, as a corner case,
optimize away pure functions, it doesn't mean that all possible and
conceivable function calls will be optimized away. For instance, the corner
case you are counting on simply doesn't happen if the functions are a part
of a library.

It certainly isn't a corner case. The compiler is free to inline any
functions at can see.
The following instructions are obtained with -O1, -O2, and -O3:

<snip>
Z5valued:
..LFB1006:
.cfi_startproc
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip
_Z9referenceRKd:
..LFB1007:
.cfi_startproc
movsd count(%rip), %xmm0
addsd (%rdi), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc
</snip>


The functions will be generated, but they are not necessarily called.
Check the code for main with -O3.
 
R

Rui Maciel

Ian said:
It certainly isn't a corner case. The compiler is free to inline any
functions at can see.

Yeah, it's a corner case. You simply can't assume that every function is a
pure function that will always be inlined under every conceivable scenario.
After all, where does the C++ standard mandate that?

You can only count on it if you invest your time making sure that a specific
compiler will be able to compile a specific function within your project to
match your specific requirements, but this is way past C++'s territory and
firmly within platform and implementation-specifics.


Rui Maciel
 
I

Ian Collins

Rui said:
Yeah, it's a corner case. You simply can't assume that every function is a
pure function that will always be inlined under every conceivable scenario.
After all, where does the C++ standard mandate that?

If it's a corner case, most code lives in a dodecahedron!

Who said a function is always inlined?
You can only count on it if you invest your time making sure that a specific
compiler will be able to compile a specific function within your project to
match your specific requirements, but this is way past C++'s territory and
firmly within platform and implementation-specifics.

Most C++ relies in the inlining of trivial functions, it's at the heart
of the language. Would you expect every call to std::vector's
operator[] to involve an actual call?
 
R

Rui Maciel

Ian said:
If it's a corner case, most code lives in a dodecahedron!

"Most" is a bit of a weasel word. Nevertheless, even if you actually
believe that all object code consists of a long winded opcode dump that is
free from any function call, it is necessary to at least acknowledge the
existence of shared libraries. It's a bit hard to optimize away code which
is linked only dynamically.

But this is way beyond the realm of C++.

Who said a function is always inlined?

I certainly didn't said that.

You can only count on it if you invest your time making sure that a
specific compiler will be able to compile a specific function within your
project to match your specific requirements, but this is way past C++'s
territory and firmly within platform and implementation-specifics.

Most C++ relies in the inlining of trivial functions, it's at the heart
of the language. Would you expect every call to std::vector's
operator[] to involve an actual call?

Trivial functions are a small subset of the whole domain of functions. A
corner case, if you will. No one can assume that all functions are trivial
functions, and subsequently that all possible optimization tricks can be
applied to all conceivable functions.


Rui Maciel
 
R

Rui Maciel

Paavo said:
Why -O1 and not -O2?

Because that's what Ian Collins used.

You are free to run the same test with O2 or O3, if you feel like it. No
one is trying to hide anythinig from anyone. Science, and all that.


Rui Maciel
 
I

Ian Collins

Rui said:
"Most" is a bit of a weasel word. Nevertheless, even if you actually
believe that all object code consists of a long winded opcode dump that is
free from any function call,

Where did I say I did?
You can only count on it if you invest your time making sure that a
specific compiler will be able to compile a specific function within your
project to match your specific requirements, but this is way past C++'s
territory and firmly within platform and implementation-specifics.

Most C++ relies in the inlining of trivial functions, it's at the heart
of the language. Would you expect every call to std::vector's
operator[] to involve an actual call?

Trivial functions are a small subset of the whole domain of functions. A
corner case, if you will.

I don't think you can apply the term "corner case" to a large part of
the standard library!
No one can assume that all functions are trivial
functions, and subsequently that all possible optimization tricks can be
applied to all conceivable functions.

I'm sure no one does.
 
Ö

Öö Tiib

You are assuming that a very specific corner case is somehow the rule, which
is a bad assumption to make.

I assume nothing. Test code demonstrating that oh so very special cornered
case was posted by you. ;)
Just because a compiler can, as a corner case,
optimize away pure functions, it doesn't mean that all possible and
conceivable function calls will be optimized away. For instance, the corner
case you are counting on simply doesn't happen if the functions are a part
of a library.

I claimed that it does not likely matter. What test demonstrates that it does?
Stack operations (passing parameters) and indirection to value in cache areso fast that those did not matter much even with older hardware and compilers. Modern stuff does them in parallel (with a likely floating point operation) pipeline and overhead is zero and difference is maybe 5% bigger powerconsumption on case of so tight cycle of calling so trivial function that screams for inlining anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top