D
Dik T. Winter
....
> I tried the following two functions with Codewarrior 7, highest
> optimisation level, and "optimise for speed" selected.
Indeed, current compilers can optimise better when you give them the
intended code rather than some convoluted version. Note however that
Tom Duff wrote it in a time when loop unrolling was *not* standard
practice with compilers. Also in that time the trick to join the
initial part with the main loop was good because it saved code space.
Also not unimportant. BTW, the trick is extremely similar to what
Cray compilers routinely did with vectorised loops:
set r1 = (length - 1) / 64;
set r2 = length - r1 * 64;
set vector-length = r1 - 1;
loop:
vector move a1 to a2; (vector-length + 1 operations)
set a1 = a1 + vector-length + 1;
set a2 = a2 + vector-length + 1;
set r1 = r1 - 1;
set vector-length = 63;
if(r1 >= 0) goto loop;
The difference is that the initial part is never empty.