Simultaneous multithreading

Simultaneous multithreading: Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.

Contents

1 Details

2 Taxonomy

3 Historical implementations

4 Modern commercial implementations

5 Disadvantages

6 See also

7 References

8 External links

Details

Multithreading is similar in concept to preemptive multitasking but is implemented at the thread level of execution in modern superscalar processors.

Simultaneous multithreading (SMT) is one of the two main implementations of multithreading, the other form being temporal multithreading. In temporal multithreading, only one thread of instructions can execute in any given pipeline stage at a time. In simultaneous multithreading, instructions from more than one thread can be executing in any given pipeline stage at a time. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity have limited the number to two for most SMT implementations, though there have been as many as 8 threads per core in, for example, the UltraSPARC T2.

Because the technique is really an efficiency solution and there is inevitable increased conflict on shared resources, measuring or agreeing on the effectiveness of the solution can be difficult. Some researchers have shown that the extra threads can be used to proactively seed a shared resource like a cache, to improve the performance of another single thread, and claim this shows that SMT is not just an efficiency solution. Others use SMT to provide redundant computation, for some level of error detection and recovery.

However, in most current cases, SMT is about hiding memory latency, increasing efficiency, and increasing throughput of computations per amount of hardware used.

Taxonomy

In processor design, there are two ways to increase on-chip parallelism with less resource requirements: one is superscalar technique which tries to increase instruction level parallelism (ILP), the other is multithreading approach exploiting thread level parallelism (TLP).

Superscalar means executing multiple instructions at the same time while chip-level multithreading (CMT) executes instructions from multiple threads within one processor chip at the same time. There are many ways to support more than one thread within a chip, namely:

Interleaved multithreading: Interleaved issue of multiple instructions from different threads, also referred to as Temporal multithreading. It can be further divided into fine-grain multithreading or coarse-grain multithreading depending on the frequency of interleaved issues. Fine-grain multithreading—such as in a barrel processor -- issues instructions for different threads after every cycle, while coarse-grain multithreading only switches to issue instructions from another thread when the current executing thread causes some long latency events (like page fault etc.). Coarse-grain multithreading is more common for less context switch between threads. For example, Intel's Montecito processor uses coarse-grain multithreading, while Sun's UltraSPARC T1 uses fine-grain multithreading. For those processors that have only one pipeline per core, interleaved multithreading is the only possible way, because it can issue at most one instruction per cycle.

Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle. The processor must be superscalar to do so.

Chip-level multiprocessing (CMP or multicore): integrates two or more processors into one chip, each executing threads independently

Any combination of multithreaded/SMT/CMP

The key factor to distinguish them is to look at how many instructions the processor can issue in one cycle and how many threads from which the instructions come. For example, Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its November 14, 2005 release) is a multicore processor combined with fine-grain multithreading technique instead of simultaneous multithreading because each core can only issue one instruction at a time.

Historical implementations

While multithreading CPUs have been around since the 1950s, simultaneous multithreading was first researched by IBM in 1968. The first major commercial microprocessor developed with SMT was the Alpha 21464 (EV8). This microprocessor was developed by DEC in coordination with Dean Tullsen of the University of California, San Diego, and Susan Eggers and Hank Levy of the University of Washington. The microprocessor was never released, since the Alpha line of microprocessors was discontinued shortly before HP acquired Compaq which had in turn acquired DEC. Dean Tullsen's work was also used to develop the "Hyper-threading" (or "HTT") versions of the Intel Pentium 4 microprocessors, such as the "Northwood" and "Prescott".

Modern commercial implementations

The Intel Pentium 4 was the first modern desktop processor to implement simultaneous multithreading, starting from the 3.06 GHz model released in 2002, and since introduced into a number of their processors. Intel calls the functionality Hyper-Threading Technology (HTT), and provides a basic two-thread SMT engine. Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very application dependent, and some programs actually slow down slightly when HTT is turned on due to increased contention for resources such as bandwidth, caches, TLBs, re-order buffer entries, etc. This is generally the case for poorly written data access routines that cause high latency intercache transactions (cache thrashing) on multi-processor systems. Programs written before multiprocessor and multicore designs were prevalent commonly did not optimize cache access because on a single CPU system there is only a single cache which is always coherent with itself. On a multiprocessor system each CPU or core will typically have its own cache, which is interlinked with the cache of other CPU/cores in the system to maintain cache coherency. If thread A accesses a memory location [00] and thread B then accesses memory location [01] it can cause an intercache transaction particularly where the cache line fill exceeds 2 bytes, as is the case for all modern processors.

The latest^[when?] MIPS architecture designs include an SMT system known as "MIPS MT". MIPS MT provides for both heavyweight virtual processing elements and lighter-weight hardware microthreads. RMI, a Cupertino-based startup, is the first MIPS vendor to provide a processor SOC based on 8 cores, each of which runs 4 threads. The threads can be run in fine-grain mode where a different thread can be executed each cycle. The threads can also be assigned priorities.

The IBM POWER5, announced in May 2004, comes as either a dual core DCM, or quad-core or oct-core MCM, with each core including a two-thread SMT engine. IBM's implementation is more sophisticated than the previous ones, because it can assign a different priority to the various threads, is more fine-grained, and the SMT engine can be turned on and off dynamically, to better execute those workloads where an SMT processor would not increase performance. This is IBM's second implementation of generally available hardware multithreading. In 2010, IBM released systems based on the POWER7 processor with 8 cores with each having four Simultaneous Intelligent Threads. This switches the threading mode between one thread, two threads or four threads depending on the number of process threads being scheduled at the time. This optimizes the use of the core for minimum response time or maximum throughput.

Although many people reported that Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its 14 November 2005 release) and the now defunct processor codenamed "Rock" (originally announced in 2005, but after many delays cancelled in 2009) are implementations of SPARC focused almost entirely on exploiting SMT and CMP techniques, Niagara is not actually using SMT. Sun refers to these combined approaches as "CMT", and the overall concept as "Throughput Computing". The Niagara has 8 cores, but each core has only one pipeline, so actually it uses fine-grained multithreading. Unlike SMT, where instructions from multiple threads share the issue window each cycle, the processor uses a round robin policy to issue instructions from the next active thread each cycle. This makes it more similar to a barrel processor. Sun Microsystems' Rock processor is different, it has more complex cores that have more than one pipeline.

The Intel Atom, released in 2008, is the first Intel product to feature SMT (marketed as Hyper-Threading) without supporting instruction reordering, speculative execution, or register renaming. Intel reintroduced Hyper-Threading with the Nehalem microarchitecture, after its absence on the Core microarchitecture.

Disadvantages

Simultaneous multithreading cannot improve performance if any of the shared resources are limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that they have to test whether simultaneous multithreading is good or bad for their application in various situations and insert extra logic to turn it off if it decreases performance. Current operating systems lack convenient API calls for this purpose and for preventing processes with different priority from taking resources from each other.^[1]

There is also a security concern with certain simultaneous multithreading implementations. Intel's hyperthreading implementation has a vulnerability through which it is possible for one application to steal a cryptographic key from another application running in the same processor by monitoring its cache use.^[2]

There is also a disadvantage if you want to use a PC for 100% with maximum performance, solving a single problem.

See also

Temporal multithreading, another implementation of hardware multithreading

Thread (computer science), the fundamental software entity scheduled by the operating system kernel to execute on a CPU or processor (core)

Symmetric multiprocessing, where the system (or partition of a larger computer hardware platform) contains more than one CPU or processor (core) and where the operating system kernel is not limited to which of the available CPUs (cores) a given thread can be scheduled to execute on

References

^ How good is hyperthreading?

^ Hyper-Threading Considered Harmful

LE Shar and ES Davidson, "A Multiminiprocessor System Implemented through Pipelining", Computer Feb 1974

D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," In 22nd Annual International Symposium on Computer Architecture, June, 1995

D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," In 23rd Annual International Symposium on Computer Architecture, May, 1996

External links

SMT news articles and academic papers

SMT research at the University of Washington

Timeline of multithreading technologies

v · d · eCPU technologies

Architecture
ISA : CISC · EDGE · EPIC · MISC · OISC · RISC · VLIW · NISC · ZISC · Harvard architecture · von Neumann architecture · 4-bit · 8-bit · 12-bit · 16-bit · 18-bit · 24-bit · 31-bit · 32-bit · 36-bit · 48-bit · 60-bit · 64-bit · 128-bit · Comparison of CPU architectures

Parallelism

Pipeline

Instruction pipelining · In-order & out-of-order execution · Register renaming · Speculative execution · Hazards

Level

Bit · Instruction · Superscalar · Data · Task

Threads

Multithreading · Simultaneous multithreading · Hyperthreading · Superthreading

Flynn's taxonomy

SISD · SIMD · MISD · MIMD

Types
Digital signal processor · Microcontroller · System-on-a-chip · Vector processor

Components
Arithmetic logic unit (ALU) · Barrel shifter · Floating-point unit (FPU) · Back-side bus · Multiplexer · Demultiplexer · Registers · Memory management unit (MMU) · Translation lookaside buffer (TLB) · Cache · Register file · Microcode · Control unit · Clock rate

Power management
APM · ACPI · Dynamic frequency scaling · Dynamic voltage scaling · Clock gating

v · d · eParallel computing

General
Cloud computing · High-performance computing · Cluster computing · Distributed computing · Grid computing

Levels
Bit · Instruction · Data · Task

Threads
Superthreading · Hyperthreading

Theory
Amdahl's law · Gustafson's law · Cost efficiency · Karp–Flatt metric · slowdown · speedup

Elements
Process · Thread · Fiber · PRAM · Instruction window

Coordination
Multiprocessing · Multithreading (computer architecture) · Memory coherency · Cache coherency · Cache invalidation · Barrier · Synchronization · Application checkpointing

Programming
Models (Implicit parallelism · Explicit parallelism · Concurrency) · Flynn's taxonomy (SISD • SIMD • MISD • MIMD (SPMD)) · Thread (computer science) · Non-blocking algorithm

Hardware

Multiprocessor (Symmetric · Asymmetric) · Memory (NUMA · COMA · distributed · shared · distributed shared) · SMT
MPP · Superscalar · Vector processor · Supercomputer · Beowulf

APIs
Ateji PX · POSIX Threads · OpenMP · OpenHMPP · PVM · MPI · UPC · Intel Threading Building Blocks · Boost.Thread · Global Arrays · Charm++ · Cilk · Co-array Fortran · OpenCL · CUDA · Dryad · DryadLINQ

Problems

Embarrassingly parallel · Grand Challenge · Software lockout · Scalability · Race conditions · Deadlock · Livelock · Deterministic algorithm · Parallel slowdown

Category · Commons

Categories:
Flynn's Taxonomy
Computer architecture
Threads

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

Simultaneous Multithreading — Der Begriff Simultaneous Multithreading (etwa simultaner Mehrfadenbetrieb), oder kurz SMT, bezeichnet die Fähigkeit eines Mikroprozessors, mittels getrennter Pipelines und/oder zusätzlicher Registersätze mehrere Threads gleichzeitig auszuführen.… … Deutsch Wikipedia
Simultaneous multithreading — Одновременная многопоточность (англ. Simultaneous Multithreading) технология, позволяющая исполнение инструкций из нескольких независимых потоков выполнения на множестве функциональных модулей суперскалярного микропроцессора в одном цикле.… … Википедия
Simultaneous multithreading — Le Simultaneous Multi Threading est une technique informatique datant des années 1950. Elle consiste, comme le Symmetric multiprocessing (SMP), à augmenter le TLP (Thread Level Parallelism), c’est à dire le parallélisme des threads. Le but est d… … Wikipédia en Français
Multithreading (computer hardware) — Multithreading computers have hardware support to efficiently execute multiple threads. These are distinguished from multiprocessing systems (such as multi core systems) in that the threads must all operate in the same address space, as there is… … Wikipedia
Multithreading (computer architecture) — This article describes hardware supports for multithreads. For thread in software, see Thread (computer science). Multithreading computers have hardware support to efficiently execute multiple threads. These are distinguished from multiprocessing … Wikipedia
Multithreading (hardwareseitig) — Dieser Artikel oder Abschnitt ist nicht hinreichend mit Belegen (Literatur, Webseiten oder Einzelnachweisen) versehen. Die fraglichen Angaben werden daher möglicherweise demnächst gelöscht. Hilf Wikipedia, indem du die Angaben recherchierst und… … Deutsch Wikipedia
Multithreading — Cet article concerne le support matériel des multithreads. Pour les thread logiciels, voir thread (informatique). Les ordinateurs dits multithreading ont du matériel qui leur permet d exécuter efficacement des thread (informatique) multiples. Il… … Wikipédia en Français
multithreading — Simultaneous processing of more than one message by an application program … IT glossary of terms, acronyms and abbreviations
Temporal multithreading — is one of the two main forms of multithreading that can be implemented on computer processor hardware, the other form being simultaneous multithreading. The distinguishing difference between the two forms is the maximum number of concurrent… … Wikipedia
Hardwareseitiges Multithreading — Durch hardwareseitiges Multithreading (auch: Mehrfädigkeit, Mehrsträngigkeit) können bestimmte Prozessoren mit nur einem vollständigen Prozessor Kern mehrere Programme quasi gleichzeitig bearbeiten. Ein solcher Prozessor wird multithreaded… … Deutsch Wikipedia

Academic Dictionaries and Encyclopedias

Simultaneous multithreading

Contents

Details

Taxonomy

Historical implementations

Modern commercial implementations

Disadvantages

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Simultaneous multithreading

Contents

Details

Taxonomy

Historical implementations

Modern commercial implementations

Disadvantages

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link