C
Chris Torek
This is getting off topic, but what the heck...
Actually, this depends on both the compiler and the target.
Some machines have simple branch-prediction hardware, and some have
more complicated hardware. A machine that follows branches "for
free" (using a separate branch unit and ALU) might as well use the
multiple-goto version I gave as the simplistic expansion. For a
machine whose hardware simply predicts "forward branch untaken,
backward branch taken" might indeed want to use the above expansion.
A machine whose hardware has branch prediction bits in the
instruction could use various forms, and one that keeps dynamic
branch prediction state could also use various forms.
Examples of the first machine ("free" branches, at least in some
cases) include UltraSPARC. Examples of the second (forward =
untaken, backward = taken) include, I think, some Pentium variants,
and pre-Ultra SPARC (and UltraSPARC when not using the new
instructions). Examples of the third (prediction bits in branches)
include UltraSPARC again, and examples of the fourth (dynamic in-CPU
prediction) include other versions of Pentium.
In some cases it is important to place the target of the branch at
particular cache offsets (and/or make sure that the branch instruction
is near but not at the end of a cache line). The micro-optimizations
for some modern CPUs are quite tricky.
A more usual [compiled] form [of a while loop] in optimizing compilers is
if (! cond) goto next
start: body;
if (cond) goto start
next:
because it makes code motion out of the loop easier,
and (I conjecture) because it helps performance due
to better branch prediction.
Actually, this depends on both the compiler and the target.
Some machines have simple branch-prediction hardware, and some have
more complicated hardware. A machine that follows branches "for
free" (using a separate branch unit and ALU) might as well use the
multiple-goto version I gave as the simplistic expansion. For a
machine whose hardware simply predicts "forward branch untaken,
backward branch taken" might indeed want to use the above expansion.
A machine whose hardware has branch prediction bits in the
instruction could use various forms, and one that keeps dynamic
branch prediction state could also use various forms.
Examples of the first machine ("free" branches, at least in some
cases) include UltraSPARC. Examples of the second (forward =
untaken, backward = taken) include, I think, some Pentium variants,
and pre-Ultra SPARC (and UltraSPARC when not using the new
instructions). Examples of the third (prediction bits in branches)
include UltraSPARC again, and examples of the fourth (dynamic in-CPU
prediction) include other versions of Pentium.
In some cases it is important to place the target of the branch at
particular cache offsets (and/or make sure that the branch instruction
is near but not at the end of a cache line). The micro-optimizations
for some modern CPUs are quite tricky.