efficient function call

H

hamze

hi,
I had a small I/O job to do, with high frequency. it write 2 operand
and read 1.
previously I used macros to do this, but strange bugs specially in
optimization process appeared.
finally I decided to use functions instead of macros but function call
overhead is much much bigger than job's time itself.
what should I do to cape this problem? I mean an efficient function
call?
 
V

Victor Bazarov

I had a small I/O job to do, with high frequency. it write 2 operand
and read 1.
previously I used macros to do this, but strange bugs specially in
optimization process appeared.
finally I decided to use functions instead of macros but function call
overhead is much much bigger than job's time itself.
what should I do to cape this problem? I mean an efficient function
call?

Usually an inline function (a function declared 'inline' or a member of
the class defined in the class definition itself, which makes it
implicitly 'inline') can be optimized better by the compiler.

What makes you think the "function call overhead" is "much bigger"? Did
you measure it? How?

V
 
N

none

Usually an inline function (a function declared 'inline' or a member of
the class defined in the class definition itself, which makes it
implicitly 'inline') can be optimized better by the compiler.

Modern compilers should inline small functions even if they are not
declared explicitely as inline if you let the compiler do so but
enabling optimisation.
What makes you think the "function call overhead" is "much bigger"? Did
you measure it? How?

Second that. What type of I/O are you talking about?

Even for a dummy program like

void foo(int &i)
{
++i;
}

int main()
{
int i = 0;
while(i < INT_MAX -1 )
{
foo(i);
}
return 0;
}

Compiled with all optimisations turned off.
The difference between incrementing via foo(i) rather than ++i is far
from being "much bigger", more like 30% additional execution time.

With optimisation enabled (e.g. g++ -O3), the difference disapears.

Yannick
 
H

hamze

Usually an inline function (a function declared 'inline' or a member of
the class defined in the class definition itself, which makes it
implicitly 'inline') can be optimized better by the compiler.

What makes you think the "function call overhead" is "much bigger"?  Did
you measure it?  How?

I have a timer ( counter ) on my system, I call it before and after
each function, different of value is elapsed clock.
when ever I use this timer for measurement, I turn optimization off to
exact measurement.
I am working on an embedded system without any OS.

with macro , as I measured my job is as little as 50 clock cycle, but
function call make it 3-4 times bigger.
thanks about inline, I will check it.
 
S

Stefan Ram

hamze said:
when ever I use this timer for measurement, I turn optimization off to
exact measurement.

When optimizing, the program should be compiled with all the
switches as in production mode (as delivered to the
customer) and be executed under the execution (customer)
environment (equipment, processor, computer). Or else, what
are you optimizing for?
 
V

Victor Bazarov

I have a timer ( counter ) on my system, I call it before and after
each function, different of value is elapsed clock.
when ever I use this timer for measurement, I turn optimization off to
exact measurement.

I.e. you're not measuring the same code that is going to be executed
(since you will compile that one with optimizations turned on). What's
the point?
I am working on an embedded system without any OS.

I am sure it's tough to develop on a system without any tools. If you
use your own timer for profiling, make sure the rest of the program is
as close as possible to the real thing, i.e. run it with your timer, but
fully optimized.

As soon as you try to measure the code's execution on the same level as
the code itself (down to, say, a function call), you will need to
actually change the code (by introducing your timer, or whatnot), and
you're going to affect the results. You can't get the exact (absolute)
timing, only the relative speed (some parts are slower than other).

You should work on optimizing only what makes the biggest difference,
and only one thing at a time. Divide a conquer. And only go as far as
you need to satisfy the overall speed of your program (which you should
check with your measurement mechanism disabled). IOW, make it as fast
as needed, but not faster. There is always the law of diminishing
returns that applies to such efforts.
with macro , as I measured my job is as little as 50 clock cycle, but
function call make it 3-4 times bigger.
thanks about inline, I will check it.

V
 
J

Joshua Maurice

I have a timer ( counter ) on my system, I call it before and after
each function, different of value is elapsed clock.
when ever I use this timer for measurement, I turn optimization off to
exact measurement.
I am working on an embedded system without any OS.

with macro , as I measured my job is as little as 50 clock cycle, but
function call make it 3-4 times bigger.
thanks about inline, I will check it.

Look. If you're not compiling with optimizations, then ignore
everything that was said in this thread. Optimizations on is the
default for production code. If you're not planning on shipping /
delivering executables with optimization on, then most of us don't
really know the performance characteristics offhand. Without basic
optimizations, C and C++ are no longer runtime competitive with
assembly.

If you are planning on shipping an optimized executable, then you need
to do your timing tests on optimized builds. Otherwise the numbers are
meaningless due to the vast speed differences between optimized and
non-optimized.

Finally, are you calling that time function every loop in a tight
inner loop? Bad. The timing function et al itself could be dwarfing
your runtime. Call your time function at far less frequent intervals,
and don't do micro-benchmarks. I would suggest doing whatever
performance change, and then rerun your entire program, and see if you
get a measurable difference in runtime.
 
H

hamze

  When optimizing, the program should be compiled with all the
  switches as in production mode (as delivered to the
  customer) and be executed under the execution (customer)
  environment (equipment, processor, computer). Or else, what
  are you optimizing for?


when I want just to compare runtime of a same code with a little
changes I turn optimization off, any time else it is on.
 
H

hamze

I.e. you're not measuring the same code that is going to be executed
(since you will compile that one with optimizations turned on).  What's
the point?


I am sure it's tough to develop on a system without any tools.  If you
use your own timer for profiling, make sure the rest of the program is
as close as possible to the real thing, i.e. run it with your timer, but
fully optimized.

As soon as you try to measure the code's execution on the same level as
the code itself (down to, say, a function call), you will need to
actually change the code (by introducing your timer, or whatnot), and
you're going to affect the results.  You can't get the exact (absolute)
timing, only the relative speed (some parts are slower than other).

You should work on optimizing only what makes the biggest difference,
and only one thing at a time.  Divide a conquer.  And only go as far as
you need to satisfy the overall speed of your program (which you should
check with your measurement mechanism disabled).  IOW, make it as fast
as needed, but not faster.  There is always the law of diminishing
returns that applies to such efforts.


V

thanks alot for your guides, I think my problem solved.
I used inline function instead of macro. I also used 'call by
reference' to reduce time overhead.
as I measured now my code is as fast as a macro, while using a "not
inline function" caused not acceptable runtime.
the second good and also strange thing is that my code size is also
just a little bigger than using "not inline function", while macro
made my code very very big. I dont know why but perhaps compiler did
something good! ( optimization was also off!! )
and the third good thing is that I don't have strange error cause by
optimization on macros anymore!

thanks allot for your help
 
H

hamze

Look. If you're not compiling with optimizations, then ignore
everything that was said in this thread. Optimizations on is the
default for production code. If you're not planning on shipping /
delivering executables with optimization on, then most of us don't
really know the performance characteristics offhand. Without basic
optimizations, C and C++ are no longer runtime competitive with
assembly.

If you are planning on shipping an optimized executable, then you need
to do your timing tests on optimized builds. Otherwise the numbers are
meaningless due to the vast speed differences between optimized and
non-optimized.

Finally, are you calling that time function every loop in a tight
inner loop? Bad. The timing function et al itself could be dwarfing
your runtime. Call your time function at far less frequent intervals,
and don't do micro-benchmarks. I would suggest doing whatever
performance change, and then rerun your entire program, and see if you
get a measurable difference in runtime.


no, I just wanted to measure runtime of a special function, not all of
my code,
so I read the timer value before and after that function. that
function is a black box from outside.

anyway I think my problem solved by using inline functions.
thanks alot
 
N

none

thanks alot for your guides, I think my problem solved.
I used inline function instead of macro. I also used 'call by
reference' to reduce time overhead.
as I measured now my code is as fast as a macro, while using a "not
inline function" caused not acceptable runtime.
the second good and also strange thing is that my code size is also
just a little bigger than using "not inline function", while macro
made my code very very big. I dont know why but perhaps compiler did
something good! ( optimization was also off!! )
and the third good thing is that I don't have strange error cause by
optimization on macros anymore!

thanks allot for your help

Sorry but you did not solve any real problem. You only solved an
imaginary problem that you thought existed but didn't.

Profiling with "optimisation off" is a total waste of time and
efforts. You are lying to yourself.

If execution speed is of importance to you, then you must let the
compiler do its job and optimise. All you did was demonstrate that
your particular compiler with the particular options that you use when
you compile on what you call "optimisation off" still inline functions
that are explicitely declared as inline. Any compiler worth its salt
will inline small functions the moment you turn on optimisation.

When profiling, you must do it in the conditions closest to real
condition and try to make your measuring tools have as little effect
as possible to the code. Turning off optimisation can easily slow
down a program by an order of magnitude.

Yannick
 
H

hamze

Sorry but you did not solve any real problem.  You only solved an
imaginary problem that you thought existed but didn't.

Profiling with "optimisation off" is a total waste of time and
efforts.  You are lying to yourself.

If execution speed is of importance to you, then you must let the
compiler do its job and optimise.  All you did was demonstrate that
your particular compiler with the particular options that you use when
you compile on what you call "optimisation off" still inline functions
that are explicitely declared as inline.  Any compiler worth its salt
will inline small functions the moment you turn on optimisation.

When profiling, you must do it in the conditions closest to real
condition and try to make your measuring tools have as little effect
as possible to the code.  Turning off optimisation can easily slow
down a program by an order of magnitude.

Yannick

I know exactly the worth of optimization, and it is always ON,
but I had a matemathical code which is written as optimized as
possible.
when I switch optimization on, measurement of code is not exact,
because compiler reorder even the
timer read and write function.
pay attention that now I only want to compare runtime, just "RELATIVE"
runtime is important,
after find a good solution I will turn on optimization and also try to
measure with that,
this is what I did for many years and always is precise enough.
 
D

Dombo

Op 25-May-11 20:08, hamze schreef:
I know exactly the worth of optimization, and it is always ON,
but I had a matemathical code which is written as optimized as
possible.
when I switch optimization on, measurement of code is not exact,
because compiler reorder even the
timer read and write function.

That can only happen if you do very fine grained measurements. There are
limits to what a compiler can/may reorder.
pay attention that now I only want to compare runtime, just "RELATIVE"
runtime is important,
after find a good solution I will turn on optimization and also try to
measure with that,
this is what I did for many years and always is precise enough.

The problem with this approach is that optimization does not affect the
performance of all code equally. For some code optimization settings can
have a significant on the performance, while for other code it hardly
matters at all. Also instrumenting your code to read timers may affect
optimization. It is very well possible that with uninstrumented code all
variables in a can be kept in registers, while reading the timers may
necessitate to pull the variables from memory which is much slower. Also
the additional code to read the timer may cause the compiler to decide
not to inline a function that would otherwise be inlined. In other
words: you cannot say much about the RELATIVE runtime performance either.

The impression I get is that you try to measure too fine grained. In the
end it is not so important how long it takes to call a certain function,
but how fast your program performs. The more detailed level you try to
measure the harder it gets to get it right, as little things may skew
the results dramatically. If for some reason you cannot avoid making
such fine grained measurements it is wise to inspect the assembly output
of your compiler (for both the instrument and uninstrumented version of
the code) to make sure you really understand what you are measuring. And
even then it is still hard thanks to things like processor caches,
branch prediction...etc which may behave quite differently because of
code that is added for the sake of measuring performance.
 
N

none

I know exactly the worth of optimization, and it is always ON,
but I had a matemathical code which is written as optimized as
possible.
when I switch optimization on, measurement of code is not exact,
because compiler reorder even the
timer read and write function.

But what's important is the execution time of the binary that will
actually be used. Not the execution time of the non-optimized,
non-reordered code. This is irrelevant.

So you try to micro-optimise instead of really improving your program.
pay attention that now I only want to compare runtime, just "RELATIVE"
runtime is important,

RELATIVE time is only relevant if the RELATIVE timing are going to be
the same in the final binary. The problem is that once the compiler
start reordering call, inlining functions, using rvo/nrvo, etc, the
RELATIVE times you have measured for a binary generated with
optimization turned off will not be relevant anymore.

I guess there will be time where you will be lucky and when code that
perform better in non-optimised mode will also perform better in
optimised mode. But this is pretty much a shot in the dark.

Your technique will lead you to believe that there is a difference
between:

------
++i;
------
#define INCREMENT(x) (++x)
------
void increment(int &i)
{
++i;
}
-----
inline void increment(int &i)
{
++i;
}
------

When in reality, there will be no difference in production code that
has been compiler optimised.

after find a good solution I will turn on optimization and also try to
measure with that,
this is what I did for many years and always is precise enough.

But everyone here have been telling you: the function call overhead
for a small function is probably going to be null when optimisation is
turned on because the compiler will inline the function regardles of
the usage of the "inline" keyword.

So all your testing has achieved absolutely nothing.

Actually, to be fair, there will be times where you are lucky and your
code change to improve your non-optimised profiling will also result
in a faster optimised binary.

However, there will often be times where your changes have absolutely
not changed the performance of the optimised binary since the compiler
is trivially able to do the exact same optimisation. Unfortunatelty,
your code changes will have rewritten the algorithm from the natural
way to express it to a semi-obfuscated way suitable for CPU execution
order without reordering nor optimising rather than suitable for human
reading. So you will have gained no performance but reduced the
maintainability of your code.

And there will also be time where the change you make to the code to
suit your non-optimised timing will result in code that can't be
optimised so well by the compiler. The overall result will be slower
optimised code that what you started with.


Yannick
 
E

Edek

But everyone here have been telling you: the function call overhead
for a small function is probably going to be null when optimisation is
turned on because the compiler will inline the function regardles of
the usage of the "inline" keyword.

Main topic aside: the above is not always true. Even in the same cpp
file, sometimes non-inline functions cannot be inlined, and inline
function can (and usually will). Compiler is free to inline a non-inline
function only if it can determine it can be sure it knows its body, and
this is not true sometimes even in the same cpp file - e.g. on an ELF
system when compiling position independent code (Linux, shared library)
almost every non-inline function can be replaced during program loading
by an arbitrary implementation and gcc must take this into account, so
no inlining can be done unless the function is really inline.

Edek
 
N

none

Main topic aside: the above is not always true. Even in the same cpp
file, sometimes non-inline functions cannot be inlined, and inline
function can (and usually will). Compiler is free to inline a non-inline
function only if it can determine it can be sure it knows its body, and
this is not true sometimes even in the same cpp file - e.g. on an ELF
system when compiling position independent code (Linux, shared library)
almost every non-inline function can be replaced during program loading
by an arbitrary implementation and gcc must take this into account, so
no inlining can be done unless the function is really inline.

Agree, there are edge cases.

One also should not forget that the "inline" keyword is also only
advisory. So even functions explicitely declared as "inline" will not
necessarily be inlined.



Yannick
 
J

Jorgen Grahn

Main topic aside: the above is not always true. Even in the same cpp
file, sometimes non-inline functions cannot be inlined, and inline
function can (and usually will). Compiler is free to inline a non-inline
function only if it can determine it can be sure it knows its body, and
this is not true sometimes even in the same cpp file - e.g. on an ELF
system when compiling position independent code (Linux, shared library)
almost every non-inline function can be replaced during program loading
by an arbitrary implementation and gcc must take this into account, so
no inlining can be done unless the function is really inline.

So you're saying a file

static int foo() { return 42; }
int bar() { return 69+foo(); }

will never inline foo() if I use g++ with the -fpic option?
That sounds wrong -- and I cannot repeat it on my system.

tuva:/tmp> g++ -W -Wall -ansi -fpic -O3 -c foo.cc
tuva:/tmp> nm -C foo.o
00000000 V DW.ref.__gxx_personality_v0
00000000 T bar()
U __gxx_personality_v0

On the other hand, you're saying "/almost/ every non-inline function
can be replaced during program loading" ... what are the rules for
this? Or rather, where can I read about it?

/Jorgen
 
E

Edek

So you're saying a file

static int foo() { return 42; }

static... (that justifies my 'almost' - I knew there are are other cases
I forgot about). Static is not exported, is it.
int bar() { return 69+foo(); }

will never inline foo() if I use g++ with the -fpic option?
That sounds wrong -- and I cannot repeat it on my system.

tuva:/tmp> g++ -W -Wall -ansi -fpic -O3 -c foo.cc
tuva:/tmp> nm -C foo.o

Symbols will be present. See -S for bar().
00000000 V DW.ref.__gxx_personality_v0
00000000 T bar()
U __gxx_personality_v0

On the other hand, you're saying "/almost/ every non-inline function
can be replaced during program loading" ... what are the rules for
this? Or rather, where can I read about it?

LD_PRELOAD, or in general the symbol resolution order is 'first found',
at least for regular symbols.

Example, replacing memcpy:
https://bugzilla.redhat.com/show_bug.cgi?id=638477#c55

Edek
 
E

Edek

So you're saying a file

static int foo() { return 42; }
int bar() { return 69+foo(); }

will never inline foo() if I use g++ with the -fpic option?
That sounds wrong -- and I cannot repeat it on my system.

tuva:/tmp> g++ -W -Wall -ansi -fpic -O3 -c foo.cc
tuva:/tmp> nm -C foo.o
00000000 V DW.ref.__gxx_personality_v0
00000000 T bar()
U __gxx_personality_v0

To check that I am not lying to you:
------------------------------------------------------
int answer () { return 42; };

int bar () { return 69|answer(); }
-------------------------------------------------
gcc -O3 -S test.cpp

.type _Z3barv, @function
_Z3barv:
..LFB1:
.cfi_startproc
movl $111, %eax
ret
------------------------------------------------------
gcc -O3 -S -fPIC test.cpp

.type _Z3barv, @function
_Z3barv:
..LFB1:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call _Z6answerv@PLT
addq $8, %rsp
.cfi_def_cfa_offset 8
orl $69, %eax
ret

And the working thing:
-------------------------------------------------------------------------
[devel@marvin preload]$ LD_LIBRARY_PATH=. ./main
bar gives111
[devel@marvin preload]$ LD_LIBRARY_PATH=. LD_PRELOAD=./mylib.so ./main
bar gives1093
[devel@marvin preload]$
----------------------------------------------------------------------

where main is

#include <iostream>

extern int bar();

int main () {
std::cout << "bar gives" << bar () <<std::endl;
}

Edek
 
J

Jorgen Grahn

static... (that justifies my 'almost' - I knew there are are other cases
I forgot about). Static is not exported, is it.

Of course it isn't, but I see nothing upthread that narrows the
discussion down to non-static functions. I have never expected the
compiler to inline[1] anything else, but I'm happy to hear that g++
sometimes can.

/Jorgen

[1] Or do related optimizations, like use custom calling
conventions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,141
Messages
2,570,817
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top