- Tomasulo algorithm
The Tomasulo algorithm is a hardware
algorithm developed in 1967 byRobert Tomasulo fromIBM . It allows sequential instructions that would normally be stalled due to certain dependencies to execute non-sequentially (out-of-order execution ). It was first implemented for the IBM360/91’s floating point unit.This algorithm differs from
scoreboarding in that it utilizesregister renaming . Where scoreboarding resolves Write-after-Write (WAW) and Write-after-Read (WAR) hazards by stalling, register renaming allows the continual issuing of instructions. The Tomasulo algorithm also uses acommon data bus (CDB) on which computed values are broadcast to all thereservation stations that may need it. This allows for improved parallel execution of instructions which may otherwise stall under the use of scoreboarding.Robert Tomasulo received the
Eckert-Mauchly Award in 1997 for this algorithm.Implementation Concepts
The following are the concepts necessary to the implementation of Tomasulo's Algorithm.
*Instructions are issued sequentially so that the effects of a sequence of instructions such as exceptions raised by these instructions occur in the same order as they would in a non-pipelined processor, regardless of the fact that they are being executed non-sequentially.
*All general-purpose and reservation station registers hold either real or virtual values. If a real value is unavailable to a destination register during the issue stage, a virtual value is initially used. The functional unit that is computing the real value is assigned as the virtual value. The virtual register values are converted to real values as soon as the designated functional unit completes its computation.
*Functional units use reservation stations with multiple slots. Each slot holds information needed to execute a single instruction, including the operation and the operands. The functional unit begins processing when it is free and when all source operands needed for an instruction are real.
Instruction Lifecycle
The three stages listed below are the stages through which each instruction passes from the time it is issued to the time its execution is complete.
tage 1: Issue
In the Issue stage, instructions are issued for execution if all operands and reservation stations are ready or else they are stalled. Registers are renamed in this step, eliminating WAR and WAW hazards.
*Retrieve the next instruction from the head of the instruction queue
*If the instruction operands are currently in the registers
**If there is a matching empty reservation station (functional unit is available)
***Issue the instruction
**Else, there is not a matching empty reservation station (functional unit is not available)
***Stall the instruction until a station or buffer is free
*Else, the operands are not in the registers
**Use virtual values, the functional unit calculating the real value, to keep track of the functional units that will produce the operandstage 2: Execute
In the Execute stage, the instruction operations are carried out. Instructions are delayed in this step until all of their operands available, eliminating RAW hazards. Program correctness is maintained through effective address calculation to prevent hazards through memory.
*If one or more of the operands is not yet available
**Monitor the common data bus (CDB) while waiting for it to be computed
**Place operands into corresponding reservation station when they become available
*When all operands are available
**If the instruction is a load or store
***Compute the effective address when the base register is available
***Place the effective address in the load or store buffer
***If the instruction is a load
****Execute as soon as the memory unit is available
***Else, the instruction is a store
****Wait for the value to be stored before sending it to the memory unit
**Else, the instruction is an ALU operation
***Execute the instruction at the corresponding functional unittage 3: Write Result
In the Write Result stage, ALU operations results are written back to registers and store operations are written back to memory.
*If the instruction was an ALU operation
**If the result is available,
***Write it on the CDB and from there into the registers and any reservation stations waiting for this result
*Else, if the instruction was a store
**Write the data to memory during this stepSee also
*
Re-order buffer
*Instruction level parallelism
*Out-of-order execution External links
* [http://www.cs.umd.edu/class/fall2001/cmsc411/projects/dynamic/tomasulo.html Dynamic Scheduling - Tomasulo's Algorithm]
Bibliography
* " [http://domino.research.ibm.com/tchjr/journalindex.nsf/0/ed39cdf7e40549ec85256bfa00683f73?OpenDocument An Efficient Algorithm for Exploiting Multiple Arithmetic Units] ", IBM Journal of Research and Development, 11(1):25-33, January 1967.
* " [http://www.dcs.ed.ac.uk/home/hase/webhase/demo/tomasulo.html WebHASE: Tomasulo's Algorithm: HASE Java applet simulation of the Tomasulo's Algorithm] ", Institute for Computing Systems Architecture, Edinburgh University.
* " [http://www.ecs.umass.edu/ece/koren/architecture/Tomasulo1/tomasulo.htm TOMASULO'S ALGORITHM FOR DYNAMIC SCHEDULING] "
* "Computer Architecture: A Quantitative Approach", John L. Hennessy & David A. Patterson
Wikimedia Foundation. 2010.