(snip)
All I know is that I proposed having a separate pipestage
to rename registers, using a RAM (SRAM) table indexed by
logical register number returning physical register number,
in 1986 or 1987 - in Wen-mei Hwu's microprocessor design
class - after he had taken us through Tomasulo and HPSm.
I.e. I proposed eliminating the CAMs, replacing them by a
RAM and an additional pipestage.
With the 360/91 system, though, values can easily have more than
one destination. I suppose that could be done other ways,
too, but it is especially convenient that way.
The idea seemed new to everyone who encountered it. It was
not universally accepted as good. Indeed, I remember arguing
with Tom Olson of AMD (if memory serves), who said that
spending an extra pipestage was not a good idea.
Many people say that the CDB was an important invention.
I think it was a bad idea - long wires, CAMs.
If the wires are too long, then add more pipeline stages along
the way. With 750ns 16way interleaved core, though, the 91
wasn't going to get much faster than 60ns.
Conceptually it is elegant, but implementation wise it is a bad idea.
The important thing is taking that conceptually elegant
CAM-ful idea, and implementing it in an efficient non-CAM manner.
The modern style of register renaming accomplishes this -
certainly for the registers, but also, depending on the
system, for the reservation stations (if those are still
being used).
Logic was much more expensive then, than now, so the
tradoffs are likely different. If you used RAM tables
with more than one entry for each source, you could do
multiple destinations easily.
I'd love to see a reference for this.
There is an issue of the IBM Journal of Research and
Development pretty much devoted to the 91. I believe
it is in there. The 91 is pretty much a favorite for
books on pipelined processor design, mostly referencing
that journal issue.
I believe that a UWisc patent on this was one of the things
that resulted in a big payment from Intel to UWisc.
Myself, I thought it was obvious.
-- glen