[...]
Knuth has frightened people from even investigating speed out of
curiosity.
Oh, yeah, right. As in "Ri-i-i-i-ght."
That's why he didn't write three entire volumes of TAOCP (and
more in the works) in an effort to reach beyond O() notation to the
constant factors that lie behind. That's why he didn't devote a
whole lot of text and a whole lot of subtle mathematics to winkling
out differences between O(n log n) Quicksort and O(n log n) Mergesort
and O(n log n) Heapsort and O(n log n(?)) Shellsort. That's why he
never explored threaded trees (they improve some operations by a
factor of only O(log n) so who cares?). That's why he never bothered
with Marsaglia's rectangle-wedge-tail method for generating normally-
distributed random numbers; it's just a trifling micro-optimization.
Oh, sorry, wait: By "frightened" you don't mean "dissuaded," but
literally "scared." Well, I confess that some of Knuth's math goes
over my head. But I feel at the same time that someone who shrinks
from even trying to understand it does not merit the title of
"engineer."
I tend to distinguish between what would be called "micro optimizations"
and "macro optimization".
worrying about the cost of an if/else block, a switch, or performing a
function/method call, is a micro optimization.
worrying about the choice of an algorithm or program architecture is a
macro-optimization. that is more about what most of the examples were about.
generally, macro-optimizations can be reasoned about, and improved upon
from experience.
OTOH, most micro-optimizations are worrying about small constant
factors, and tend to make code look nasty if used (and so are better off
used sparingly).
the main issue is that many people, when faced with poor performance (or
worry about possible poor performance), look right to trying to
micro-optimize whatever they perceive "might be" slow (which tends to be
bottom-level costs, like the costs of some operation X, ...), rather
than evaluating the larger scale architectural issues ("is there some
way I can avoid this operation altogether?").
but, there can be some overlap:
for example, I recently migrated an interpreter of mine from using a
"loop-and-switch" strategy to using threaded code.
this could be argued to be a micro-optimization (me afraid of the small
constant overhead of the switch), but actually it was more motivated by
flexibility: there are some things which can be done with threaded code
which can't be so readily done using fixed-form logic within a switch.
so, debatably, it wasn't even really an optimization in the first place,
it only had the side-effect of slightly improving performance.
OTOH, I recently also made an optimization to my type-checking code
which reduced the cost of a type-lookup operation by feeding the pointer
through a hash (seeing if the pointer recently had its type looked up).
however, even though it did effect behavior (it skipped the more
expensive operation of looking up an object in the heap), I still
classify this as a micro-optimization (however, to my defense, the
profiler was showing a lot of time going into type-lookups in this case).
similarly, it exhibited the usual micro-optimization property in that it
added some hair to the effected code (some logic to worry about a hash,
and a few calls added elsewhere to flush the hash).
more so, it is not likely to be a uniformly distributed optimization: it
will not improve the performance of type-checking things like fixnums,
which don't involve such a heap lookup in the first place.
or such...