- Multiply-accumulate
In computing, especially
digital signal processing , multiply-accumulate is a common operation that computes the product of two numbers and adds that product to an accumulator.:
When done with
floating point numbers it might be performed with tworound ings (typical in many DSPs) or with a single rounding. When performed with a single rounding, it is called a fused multiply-add (FMA) or fused multiply-accumulate (FMAC).Modern computers may contain a dedicated multiply-accumulate unit, or "MAC-unit", consisting of a multiplier implemented in
combinational logic followed by an adder and an accumulator register which stores the result when clocked. The output of the register is fed back to one input of the adder, so that on each clock the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the method of shifting and adding typical of earlier computers. The first processors to be equipped with MAC-units weredigital signal processor s, but the technique is now common in general-purpose processors too.In floating-point arithmetic
When done with
integer s, the operation is typically exact (computed modulo somepower of 2 ). However,floating-point numbers have only a certain amount of mathematical precision. That is, digital floating-point arithmetic is generally not associative or distributive. (SeeFloating point#Accuracy problems .)Therefore, it makes a difference to the result whether the multiply-add is performed with two roundings, or in one operation with a single rounding. When performed with a single rounding, the operation is termed a fused multiply-add.
Fused multiply-add
A "fused" multiply-add is a floating-point multiply-add operation performed in one step, with a single rounding. That is, where an unfused multiply-add would compute the product , round it to "N" significant bits, add the result to "a", and round back to "N" significant bits, a fused multiply-add would compute the entire sum to its full precision before rounding the final result down to "N" significant bits.
When implemented in a
microprocessor , this is typically faster than a multiply operation followed by an add. Because of this instruction there is no need for a hardware divide orsquare root unit, since they can both be implemented efficiently in software using the FMA.A fast FMA can speed up and improve the accuracy of many computations which involve the accumulation of products:
*Dot product
*Matrix multiplication
*Polynomial evaluation (e.g., withHorner's rule )The FMA operation will likely be added to
IEEE 754 inIEEE 754r .The 1999 standard of the C programming language supports the FMA operation through the
fma
standard math library function.FMA capability is also present in the
NVIDIA GeForce 200 Series (GTX 200) andNVIDIA Tesla T10 computingGPU processors. A fused multiply-add is implemented on theSPARC64 ,PowerPC ,PA-RISC (PA-8000 and above) andItanium processors and will be implemented inAMD processors withSSE5 instruction set support. Intel plans to implement FMA in its 'Haswell' chip, due sometime in 2012. [http://www.reghardware.co.uk/2008/08/19/idf_intel_architecture_roadmap/ - Intel adds 22nm octo-core 'Haswell' to CPU design roadmap, The Register]Reference
Wikimedia Foundation. 2010.