I expect that my experience is different because I program large
systems (typically hundreds of thousands to millions of lines of code).
Register allocation is provably NP-hard, and is solved by very compute
intensive algorithms like graph coloring (though linear scan looks
promising for some situations):
I think this is missing the point. The problem is not hand allocating
registers for all million line of code, but rather let the compiler do a
good job for everything except where the programmer determines that
1) register allocation makes a significant difference
2) he can improve on the default allocation by specifying register
allocation of certain variables.
In C the programmer can specify where he wants specific allocation by
using the register keyword and allow default allocation everywhere else
by not specifying register.
This is similar to doing a good printed circuit board layout -- let the
autorouter do high 90s% of the work and allow the human to tweak the
routing where needed, if needed. The best result is a synergy between
man and machine.
For a large software system, no human on earth has the ability to
out-think the compiler when it comes to register allocation.
Man or machine is a false dichotomy. The best result is achieved by
using both in an intelligent way.
Part of the problem is the compiler not knowing the optimization
criteria. A compiler may be great at optimizing overall, but I may need
the minimum execution time between an interrupt and writing a servo
control. The rest of the code may be much less critical in terms of
timing. It may be better for that application to allocate 50% of the
registers to optimizing 10 lines of code in the critical path.
Allocating the variables for the critical path code is probably the
solution and yes, it needs to be verified.
On the other, other hand, embedded compilers may be primitive in their
ability to optimize register allocation and the embedded software
systems tend to be very small so in such a circumstance it may be
possible to do better than the compiler.
Unfortunately, that is often true. It seems ironic that usually more
effort is applied to optimizing targets with lots of resources and not
those with minimal resources. :-( Some low-end targeted compilers are
much better than others.
On rare circumstances, when
measurements indicated a problem, I have been able to outguess the
compiler with __forceinline__ but that is a different matter, not
related to the register keyword.
Actually I would consider it similar to the use of register. You are
saying "I need speed right here!" to the compiler for your particular
application.
With a half millions or so lines of
code, it is literally impossible for me to outguess the compiler with
register allocation.
And you don't need to do it for all lines, only the critical ones, just
the way you specified selective inlining. ;-)
For commercial compilers designed for large systems, I would be very
annoyed at any compiler that did not ignore the register keyword
completely.
"My code is good, compiler, except for use of 'register', in which case
you should assume that I am babbling nonsense". I recommend a
compile-time option to cater to that certain class of programmer.