How to force 'inline' with GCC or ICC

P

Patrick Laurent

Hello

I have a program with many many inlined template functions
It is essential for the execution speed that every (or almost every)
function marked as inlined, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and
it works fine and fast. I verified that the functions are really inlined.

But with GCC 3.4 (Linux+Cygwin) or ICC (Linux), The same program is about 5
times slower than under Windows.
The '-Winline' option of GCC shows me that many functions are not inlined
like they should. The compiler consider the 'inline' keyword as an advice,
but does not follow it.
I try to set various options of GCC, but nothing is satisfactory as far:
-finline-limie 100000000
--param large-function-growth
--param max-inline-insns-single
....

Has some-one suggestions how to force GCC/ICC to obey, or to increase the
limits that these compilers internaly have?
 
C

Chris.Theis

15.05.2005 16:25 answer to:
Patrick Laurent said:
Hello

I have a program with many many inlined template functions
It is essential for the execution speed that every (or almost every)
function marked as inlined, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and
it works fine and fast. I verified that the functions are really inlined.

But with GCC 3.4 (Linux+Cygwin) or ICC (Linux), The same program is about 5
times slower than under Windows.
The '-Winline' option of GCC shows me that many functions are not inlined
like they should. The compiler consider the 'inline' keyword as an advice,
but does not follow it.
I try to set various options of GCC, but nothing is satisfactory as far:
-finline-limie 100000000
--param large-function-growth
--param max-inline-insns-single
....

Has some-one suggestions how to force GCC/ICC to obey, or to increase the
limits that these compilers internaly have?


Inlining is generally a rather tricky business. The keyword is simply a
hint and not mandatory. It depends not only on the compiler switches that
you issue, but also on the code itself whether the compiler decides to
inline the code or not. Virtual functions for example may or may not be
inlined, depending whether the compiler can unambiguously identify the
"real" object type. The general rule is that polymorphism must work and all
optimizations must stand behind. There are more issues (e.g. recursion,
etc...) regarding the decision of inlined code. Probably you could give
some example code?

BTW, check if the debug & the release version of your code show the same
inlining-behavior.

Cheers
Chris
 
P

Paul Schneider

Patrick said:
Hello

I have a program with many many inlined template functions
It is essential for the execution speed that every (or almost every)
function marked as inlined, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and
it works fine and fast. I verified that the functions are really inlined.

But with GCC 3.4 (Linux+Cygwin) or ICC (Linux), The same program is about 5
times slower than under Windows.
The '-Winline' option of GCC shows me that many functions are not inlined
like they should. The compiler consider the 'inline' keyword as an advice,
but does not follow it.
I try to set various options of GCC, but nothing is satisfactory as far:
-finline-limie 100000000
--param large-function-growth
--param max-inline-insns-single
...


Did you actually supply a value to the --param arguments? Otherwise you
probably set them to zero. With -Winline g++ 3.4 tells me exactly which
parameter is exceeded.

p
 
P

Patrick Laurent

Did you actually supply a value to the --param arguments? Otherwise
you probably set them to zero. With -Winline g++ 3.4 tells me exactly
which parameter is exceeded.

I only wrote the parameters names.

Yes,I did supply a value (in fact I tried many values, most of the time big
values).

But GCC still doesn't inline many important functions, in comparison ICL on
windows does.



You are right, GCC tells which parameter is exceeded, so I always supplied a
bigger value to every corresponding parameter (up to astronomic values). But
it did not work: a few more functions were inlined, but the execution speed
is still very much slower than on Windows.



Is there no way to force inlining?



Pat
 
P

Patrick Laurent

Is there no way to force inlining?
__forceinline (instead of inline) on Windows.

That's not my question.
I have no problem with ICL on a visual environnement. And I already knew
'__forceinline' if I had a problem.
I need to force inlining with GCC and ICC.
 
I

Ioannis Vranos

Patrick said:
That's not my question.
I have no problem with ICL on a visual environnement. And I already knew
'__forceinline' if I had a problem.
I need to force inlining with GCC and ICC.


I think you will find more help on this, if you consult the GCC mailing lists (and
newsgroups if such exist).
 
P

Patrick Laurent

I found __inline__ for GCC, it is an improvement, but it is still not
satisfying.
I define a macro:
#define inline __inline__
 
C

Chris Theis

Patrick said:
I found __inline__ for GCC, it is an improvement, but it is still not
satisfying.
I define a macro:
#define inline __inline__

You're more or less trying to force your compiler to do something, which
it obviously has reasons not to do. In general you can trust your
compiler with these things ;-) Only because other compilers will inline
some code does not necessarily mean that it can be inlined by all
compilers because that depends very much on the whole code and the
development system including the linker. Inlining is not as trivial as
it seems at first glance (see my previous post).

I'd suggest that you either follow Ioannis hint of asking in a GCC
dedicated group or at least show us part of your code, which is not
inlined by GCC, but by other compilers.

Cheers
Chris
 
P

Patrick Laurent

I verified, in fact __inline__ is not better.

That's the point, I obviously cannot trust GCC nor ICC, because ICL on
Windows results in much quicker execution speed.
GCC and ICC do not inline functions as good as ICL.
The fact is that GCC is very very very bad in my case in comparison to ICL.

It is not possible to give a small test programm. If you want to test on
your own, I propose you download my library at following adress, and compile
the following test. (No need to compile the library, it is STL-like)
http://www.ient.rwth-aachen.de/team/laurent/genial/genial.html

The execution time on a Pentium 4, 3.2GHz:
With ICL on Windows:
-No simd: 0.368s
-SSE: 0.126s
-SSE3: 0.112s
With GCC on Cygwin (-O3 -msse3 -UWIN32 -ftemplate-depth-36 -lstlport)
-No SIMD : 0.969s
-SSE: 2.069s


#define FFT_LEVEL 32
#include "signal/fft.h"
int main()
{
DenseVector<complex<float> >::self X(32,0);
DenseVector<complex<float> >::self Y(X.size(),0);
double t0=get_time();
for (int i=0; i<1000000; ++i)
fft(X,Y);
cout << get_time()-t0 << endl;
}
 
R

Rapscallion

Patrick said:
That's the point, I obviously cannot trust GCC nor ICC, because ICL on
Windows results in much quicker execution speed.

You compare a compiler for multiple platforms and processors with a
highly specialiced compiler for one processor (line). The latter is
faster. Surprise? A sportscar is faster than a general purpose van.
Surprise?

R.C.
 
P

Patrick Laurent

I compared many other configurations (Windows/Cygwin/Linux, ICL/ICC/GCC, No
Simd/SSE/SSE2/SSE3), but every configuration was on the same pentium 4
3.2GHz.
I also compiled with GCC with the pentium specific flags.

So please, stop defending GCC. Did you buy actions of GNU?
I found the (main) reason why it is much slower than ICL: it does not inline
the functions as much as ICL.

Rapscallion, if I understand you well, it's normal that GCC is much slower
than ICL on the same system.
Maybe GCC and ICL are both good to compile C, but ICL has a clear advance
for generic C++.

Pat
 
L

Lionel B

Patrick Laurent said:
I verified, in fact __inline__ is not better.

That's the point, I obviously cannot trust GCC nor ICC, because ICL on
Windows results in much quicker execution speed.
GCC and ICC do not inline functions as good as ICL.
The fact is that GCC is very very very bad in my case in comparison to ICL.

Why are you so convinced that (lack of) inlining is responsible for the comparatively poor performance you are seeing
for GCC? There are other possible explanations (it is also quite well-known that excessive inlining can actually be
detrimental to performance - modern compilers really do know best). There are, in particular, other (particularly
floating point-related) optimisation settings for GCC that you may want to look into.

Perhaps gnu.g++.help would be a good place to ask.

Regards,
 
P

Patrick Laurent

I am convicted that the poor performance is due to the lack of inlining
because I get slow execution speed with ICL when the functions are not
marked as 'inline'.
With the '-Winline' option of GCC, I see every not inlined functions.

Also the SSE mode should be much quicker than without SIMD, but requires
much more inlining.
ICL manages it, GCC not at all. (see speed measure in a previous post)
 
R

Rapscallion

Patrick said:
Rapscallion, if I understand you well, it's normal that GCC is much slower
than ICL on the same system.

Yes, VC++ and ICL produce faster and smaller code on Windows. Is this
really surprising to you?
Maybe GCC and ICL are both good to compile C, but ICL has a clear advance
for generic C++.

A volunteer open source project vs the compiler team of a BIG company.

R.C.
 
L

Lionel B

Patrick Laurent said:
I am convicted that the poor performance is due to the lack of inlining
because I get slow execution speed with ICL when the functions are not
marked as 'inline'.

So that explains why inlining is crucial to the performance of the ICL compiled code :)
With the '-Winline' option of GCC, I see every not inlined functions.

Does ICL really inline everything you tell it to (either explicitly or via member definition in class declaration[*])? I
find that quite surprising. [*] BTW, I note that you use the "inline" keyword within class declarations - that is
redundant, as far as I know... or is it just to generate warnings?

I also find it somewhat odd that the ICC compiler should produce much slower code for Linux than ICL does for Windows.
Also the SSE mode should be much quicker than without SIMD, but requires
much more inlining.
ICL manages it, GCC not at all. (see speed measure in a previous post)

ICL does have a good reputation as an optimising compiler; I've never found gcc that fantastic for optimisation either
(on Win32 about on par with the old VC6), although it is difficult to generalise, as relative performance seems to
depend heavily on the nature of the code. As of ver 4.0 (recently released) GCC has a new optimisation framework.
Apparently new optimisations are not yet in place, but we are told to expect better optimisation in forthcoming
releases...

Regards,
 
P

Patrick Laurent

ICL really inline the functions.
I debugged the program in release mode (yes, it's possible), and also saw
the assembler code. I don't understand much the assembler, but it's easy to
see that the functions are really inlined.

It is also possible with ICL to use the non standard '__forceinline'
keyword, but it does not bring much.
If such a '__forceinline' would exist for GCC, it would be useful.
I use the 'inline'within class declaration test purposes, I write the
following macro for ICL:
#define inline __forceinline

I cannot either explain why ICC produce much slower code than ICL.
I do not know how to see which functions it inlined or not.
The '__forceinline' has no effect with ICC.

Pat
 
I

Ioannis Vranos

Lionel said:
BTW, I note that you use the "inline" keyword within class declarations - that is
redundant, as far as I know... or is it just to generate warnings?


As far as I know it is redundant but it is also legal, no warnings should be produced.
 
I

Ioannis Vranos

Patrick said:
ICL really inline the functions.
I debugged the program in release mode (yes, it's possible), and also saw
the assembler code. I don't understand much the assembler, but it's easy to
see that the functions are really inlined.

It is also possible with ICL to use the non standard '__forceinline'
keyword, but it does not bring much.
If such a '__forceinline' would exist for GCC, it would be useful.
I use the 'inline'within class declaration test purposes, I write the
following macro for ICL:
#define inline __forceinline

I cannot either explain why ICC produce much slower code than ICL.
I do not know how to see which functions it inlined or not.
The '__forceinline' has no effect with ICC.


If you want the functions to be inlined at all costs and there is no GCC switch to do that
(you should really consult GCC mailing lists and/or newsgroups for that, check
http://gcc.gnu.org for any discussion mailing list that is suitable for your subject),
then you can use the low level part of C++, macros. If you write your code with macros, it
will definitely be inlined.
 
L

Lionel B

Ioannis Vranos said:
As far as I know it is redundant but it is also legal, no warnings should be produced.

Of course - I didn't express myself well: I thought it might be necessary to use the "inline" keyword to enable
reporting of non-inlining by g++ with the -Winline flag (I've no reason to suspect that that is the case, beyond the use
of "inline" by the OP).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,710
Latest member
bernietqt

Latest Threads

Top