- SSE3
SSE3, also known by its
Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for theIA-32 architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of theirPentium 4 CPU. In April 2005,AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of theirAthlon 64 CPUs. The earlierSIMD instruction sets on thex86 platform, from oldest to newest, are MMX,3DNow! (developed by AMD), SSE andSSE2 .SSE3 contains 13 new instructions over
SSE2 .Changes
The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added. These instructions simplify the implementation of a number of DSP and 3D operations. There is also a new instruction to convert floating point values to integers without having to change the global rounding mode, thus avoiding costly pipeline stalls. Finally, the extension adds LDDQU, an alternative misaligned integer vector load that has better performance on
NetBurst architectures for loads that cross cacheline boundaries.CPUs with SSE3
*
AMD :
**Athlon 64 (since Venice Stepping E3 and San Diego Stepping E4)
**Athlon 64 X2
**Athlon 64 FX (since San Diego Stepping E4)
**Opteron (since Stepping E4)
**Sempron (since Palermo. Stepping E3)
**Phenom
**Turion 64
**Turion 64 X2
*Intel :
**Celeron D
**Celeron 420, 430 and 440
**Pentium 4 (since Prescott)
**Pentium D
**Pentium Dual-Core
**Pentium Extreme Edition (but NOT Pentium 4 Extreme Edition)
**Intel Core Duo
**Intel Core Solo
**Intel Core 2 Duo
**Intel Core 2 Extreme
**Intel Core 2 Quad
**Xeon (since Nocona)
**Atom
*VIA/Centaur:
**C7
**Nano
*Transmeta
**Efficeon TM88xx (NOT Model Numbers TM86xx)New instructions
Common instructions
Arithmetic
* ADDSUBPD - ("Add-Subtract-Packed-Double")
** Input - { A0, A1 }, { B0, B1 }
** Output - { A0 - B0, A1 + B1 }
* ADDSUBPS - ("Add-Subtract-Packed-Single")
** Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
** Output: { A0 - B0, A1 + B1, A2 - B2, A3 + B3 }AOS ( Array Of Structures )
* HADDPD - ("Horizontal-Add-Packed-Double")
** Input: { A0, A1 }, { B0, B1 }
** Output: { A0 + A1, B0 + B1 }
* HADDPS ("Horizontal-Add-Packed-Single")
** Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
** Output: { A0 + A1, A2 + A3, B0 + B1, B2 + B3 }
* HSUBPD - ("Horizontal-Subtract-Packed-Double")
** Input: { A0, A1 }, { B0, B1 }
** Output: { A0 - A1, B0 - B1 }
* HSUBPS - ("Horizontal-Subtract-Packed-Single")
** Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
** Output: { A0 - A1, A2 - A3, B0 - B1, B2 - B3 }
* LDDQU - As stated above, this is an alternative misaligned integer vector load. It can be helpful for video compression tasks.
* MOVDDUP, MOVSHDUP, MOVSLDUP - These are also used for complex numbers, and can be helpful for wave calculation like sound.
* FISTTP - Like the older x87 FISTP instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead. Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard.Intel instructions
* MONITOR, MWAIT - These optimize multi-threaded applications, giving processors with
Hyper-Threading better performance.ee also
*
Computer numbering formats
*Streaming SIMD Extensions
*SSE2
*SSSE3
*SSE4
*SIMD External links
* [http://www.intel.com/cd/ids/developer/asmo-na/eng/66717.htm SSE3 Overview by Intel]
* [http://www.xbitlabs.com/articles/cpu/display/prescott_10.html X-bit Labs]
Wikimedia Foundation. 2010.