I know exactly the worth of optimization, and it is always ON,
but I had a matemathical code which is written as optimized as
possible.
when I switch optimization on, measurement of code is not exact,
because compiler reorder even the
timer read and write function.
But what's important is the execution time of the binary that will
actually be used. Not the execution time of the non-optimized,
non-reordered code. This is irrelevant.
So you try to micro-optimise instead of really improving your program.
pay attention that now I only want to compare runtime, just "RELATIVE"
runtime is important,
RELATIVE time is only relevant if the RELATIVE timing are going to be
the same in the final binary. The problem is that once the compiler
start reordering call, inlining functions, using rvo/nrvo, etc, the
RELATIVE times you have measured for a binary generated with
optimization turned off will not be relevant anymore.
I guess there will be time where you will be lucky and when code that
perform better in non-optimised mode will also perform better in
optimised mode. But this is pretty much a shot in the dark.
Your technique will lead you to believe that there is a difference
between:
------
++i;
------
#define INCREMENT(x) (++x)
------
void increment(int &i)
{
++i;
}
-----
inline void increment(int &i)
{
++i;
}
------
When in reality, there will be no difference in production code that
has been compiler optimised.
after find a good solution I will turn on optimization and also try to
measure with that,
this is what I did for many years and always is precise enough.
But everyone here have been telling you: the function call overhead
for a small function is probably going to be null when optimisation is
turned on because the compiler will inline the function regardles of
the usage of the "inline" keyword.
So all your testing has achieved absolutely nothing.
Actually, to be fair, there will be times where you are lucky and your
code change to improve your non-optimised profiling will also result
in a faster optimised binary.
However, there will often be times where your changes have absolutely
not changed the performance of the optimised binary since the compiler
is trivially able to do the exact same optimisation. Unfortunatelty,
your code changes will have rewritten the algorithm from the natural
way to express it to a semi-obfuscated way suitable for CPU execution
order without reordering nor optimising rather than suitable for human
reading. So you will have gained no performance but reduced the
maintainability of your code.
And there will also be time where the change you make to the code to
suit your non-optimised timing will result in code that can't be
optimised so well by the compiler. The overall result will be slower
optimised code that what you started with.
Yannick