- Larrabee (GPU)
Larrabee is the codename for a graphics processing unit (GPU) chip that
Intel is developing separately from its current line of integrated graphics accelerators. Thevideo card containing Larrabee is expected to compete withGeForce andRadeon products fromNVIDIA and AMD/ATI respectively. Larrabee will also compete in theGPGPU andhigh-performance computing markets. Intel plans to have engineering samples of Larrabee ready by the end of 2008, with avideo card hitting shelves in late 2009 or 2010. [cite web|url=http://beyond3d.com/content/news/565|title=Larrabee: Samples in Late 08, Products in 2H09/1H10|accessdate=2008-01-17|publisher="beyond3d.com"]Comparison with competing products
Larrabee can be considered a hybrid between a
multi-core CPU and aGPU , and has similarities to both. Its coherent cache hierarchy andx86 architecture compatibility are CPU-like, while its wideSIMD vector units and texture sampling hardware are GPU-like.As a GPU, Larrabee will support traditional rasterized
3D graphics (DirectX /OpenGL ) for games. However, Larrabee's hybrid of CPU and GPU features should be suitable for general purpose GPU (GPGPU) orstream processing tasks. [citeweb|title=First Details on a Future Intel Design Codenamed 'Larrabee'|url=http://www.intel.com/pressroom/archive/releases/20080804fact.htm|publisher="Intel "|accessdate=2008-09-01] For example, Larrabee might perform ray tracing orphysics processing , [cite web|url=http://arstechnica.com/news.ars/post/20070917-intel-picks-up-gaming-physics-engine-for-forthcoming-gpu-product.html|title=Intel picks up gaming physics engine for forthcoming GPU product|accessdate=2007-09-17|publisher="Ars Technica "|first=Jon|last=Stokes] inreal time for games or offline for scientific research as a component of asupercomputer . [cite web |last=Stokes|first=Jon|title=Clearing up the confusion over Intel's Larrabee|publisher="Ars Technica "|url=http://arstechnica.com/articles/paedia/hardware/clearing-up-the-confusion-over-intels-larrabee.ars|accessdate=2007-06-01] In thehigh-performance computing market Intel's CPUs are in some cases being displaced by GPGPU products likeNVIDIA Tesla andAMD FireStream (for example in the #2 supercomputer on theTOP500 list [http://news.softpedia.com/news/Nvidia-Tesla-GPGPU-Shows-Up-in-Bull-039-s-NovaScale-Supercomputer-84089.shtml] ); Larrabee is Intel's answer to GPGPU.cite web|url=http://www.theinquirer.net/en/inquirer/news/2007/04/03/gpgpu-vs-cpu-will-be-the-war-of-2008-9 |title=GPGPU vs. CPU will be the war of 2008-9 |publisher=The Inquirer |author=Theo Valich |date = 03 April 2007 |accessdate=2008-08-24]Larrabee's early presentation has drawn some criticism from GPU competitors. At NVISION 08, several
NVIDIA employees called theSiggraph paper "marketing puff" and told the press that the Larrabee architecture was "like aGPU from 2006". [ [http://www.pcpro.co.uk/news/220947/nvision-larrabee-like-a-gpu-from-2006.html NVISION 08, "Larrabee like a GPU from 2006"] ]Differences with current GPUs
Larrabee will differ from other discrete GPUs currently on the market such as the
GeForce 200 Series and the Radeon 4000 series in three major ways:* Larrabee will use the
x86 instruction set with Larrabee-specific extensions.* Larrabee will feature
cache coherency across all its cores.* Larrabee will include very little specialized graphics hardware, instead performing tasks like z-buffering, clipping, and blending in software, using a tile-based rendering approach. A renderer implemented in software can more easily be modified, allowing more differentiation in appearance between games or other 3D applications. Intel's SIGGRAPH 2008 paper mentions order-independent transparency,
irregular Z-buffer ing, and real-timeraytracing as rendering features that can be implemented with Larrabee.Differences with CPUs
The x86 processor cores in Larrabee will be different in several ways from the cores in current Intel CPUs such as the Core 2 Duo:
* Larrabee's x86 cores will be based on the much simpler
Pentium P54C design which is still being maintained for use in embedded applications. citeweb|title=Intel's Larrabee GPU based on secret Pentagon tech, sorta [Updated] |url=http://arstechnica.com/news.ars/post/20080708-intels-larrabee-gpu-based-on-secret-pentagon-tech-sorta.html|publisher="Ars Technica "|accessdate=2008-08-06] The P54C-derived core issuperscalar but does not includeout-of-order execution , though it has been updated with modern features such asx86-64 support, similarily toIntel Atom . In-order execution means lower performance for individual cores, but since they are smaller, more can fit on a single chip, increasing overall throughput.* Each Larrabee core contains a 512-bit vector processing unit, able to process 16 single precision floating point numbers at a time. This is similar to but four times larger than the SSE units on most x86 processors, with additional features like
scatter/gather instructions and a mask register designed to make using the vector unit easier and more efficient. Larrabee derives most of its number-crunching power from these vector units.citeweb|title=Larrabee: A Many-Core x86 Architecture for Visual Computing|url=http://softwarecommunity.intel.com/UserFiles/en-us/File/larrabee_manycore.pdf|publisher="Intel "|accessdate=2008-08-06|doi=10.1145/1399504.1360617]* Larrabee includes one major fixed-function graphics hardware feature: texture sampling units. These perform trilinear and
anisotropic filtering and texture decompression.* Larrabee has a 1024-bit (512-bit each way) ring bus for communication between cores and to memory. This bus can be configured in two modes to support Larrabee products with 16 cores or more, or fewer than 16 cores.cite web |last=Glaskowsky|first=Peter|title=Intel's Larrabee--more and less than meets the eye|url=http://news.cnet.com/8301-13512_3-10006184-23.html|publisher="
CNET "|accessdate=2008-08-20]* Larrabee includes explicit cache control instructions to reduce
cache thrashing during streaming operations which only read/write data once.* Each core supports 4-way simultaneous multithreading, with 4 copies of each
processor register .Theoretically Larrabee's x86 processor cores can run existing PC software; even operating systems. However, Larrabee's video card will not include all the features of a PC-compatible motherboard, so PC operating systems and applications will not run without modifications. A different version of Larrabee might sit in motherboard CPU sockets using QuickPath [cite web |last=Stokes|first=Jon|title=Clearing up the confusion over Intel's Larrabee, part II|url=http://arstechnica.com/news.ars/post/20070604-clearing-up-the-confusion-over-intels-larrabee-part-ii.html|publisher="
Ars Technica "|accessdate=2008-01-16] , but Intel has not yet announced plans for this. Even if compatibility is achieved, to run efficiently software must be rewritten to use Larrabee's vector units, and not all software can put them to good use.Comparison with the Cell Broadband Engine
Larrabee's philosophy of using many small, simple cores has similarities to the ideas behind the Cell processor. However, there are differences in implementation.
* The Cell processor includes one main processor which controls many smaller processors. In contrast, all of Larrabee's cores are the same, which can be useful for various purposes such as load balancing and task migration.
* Cell and Larrabee both use a high-bandwidth ring bus to communicate between cores.
* Each compute core in the Cell (SPE) has a local store, for which explicit operations (DMA) are used for flexible data transfer without allowing direct access (load/store) from other cores. In Larrabee, all on-chip and off-chip memories are under automatically-managed coherent cache hierarchy, so that its cores virtually share a uniform memory space through standard load/store instructions..
* Because of cache coherence noted above, each program running in Larrabee has virtually a large linear memory just as in traditional general-purpose CPU; whereas an application for Cell should be programmed taking into consideration limited memory footprint of the local store associated with each SPE (for details see this article) but with theoretically higher bandwidth.
* Cell uses DMA for data transfer to/from on-chip local memories, which has a merit in flexibility and throughput; whereas Larrabee uses special instructions for cache manipulation (notably cache eviction hints and pre-fetch instructions), which has a merit in that it can maintain cache coherence (hence the standard
memory hierarchy ) while boosting performance for e.g. rendering pipelines and other stream-like computation..Comparison with Intel GMA
Intel currently sells a line of GPUs under the
Intel GMA brand. These chips are not sold separately but are integrated onto motherboards. Though the low cost and power consumption of Intel GMA chips make them suitable for small laptops and less demanding tasks, they lack the 3D graphics processing power to compete with NVIDIA and AMD/ATI for a share of the high-end gaming computer market, the HPC market, or a place in popularvideo game console s. In contrast, Larrabee will be sold as a discrete GPU, separately from motherboards, and is expected to have performance good enough for consideration in the next generation of video game consoles. [cite web|url=http://www.totalvideogames.com/news/Intels_Larrabee_Shaping_Up_For_Next-Gen_Consoles_13643_6321_0.htm |title=Intel's Larrabee Shaping Up For Next-Gen Consoles? |author=Chris Leyton |date=2008-08-13 |accessdate=2008-08-24]The team working on Larrabee is separate from the
Intel GMA team. The hardware is being designed by Intel'sHillsboro, Oregon design team, whose last major design was thePentium 4 . The software and drivers are being written by a newly-formed team. The 3D stack specifically is being written by developers atRAD Game Tools (includingMichael Abrash ). [ [http://anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367 AnandTech: Intel's Larrabee Architecture Disclosure: A Calculated First Move] ]Preliminary performance data
Intel's
SIGGRAPH 2008 paper describes simulations of Larrabee's projected performance. Graphs show how many 1 GHz Larrabee cores are required to maintain 60 FPS at 1600x1200 resolution in several popular games. Roughly 25 cores are required forGears of War with no antialiasing, 25 cores forF.E.A.R with 4x antialiasing, and 10 cores for with 4x antialiasing. It is likely that Larrabee will run faster than 1 GHz, so these numbers are conservative. [cite web|url=http://www.tomshardware.com/news/intel-larrabee-idf,6210.html |title=Intel's 'Larrabee' to Shakeup AMD, Nvidia |publisher=Tom's Hardware |date=August 20, 2008 |author=Steve Seguin |accessdate=2008-08-24] Another graph shows that performance on these games scales perfectly linearly with the number of cores up to 32 cores. At 48 cores the performance scaling is roughly 90% of linear.A June 2007 PC Watch article suggests that the first Larrabee chips will feature 32 x86 processor cores and come out in late 2009, fabricated on a 45 nanometer process. Chips with a few defective cores due to yield issues will be sold as a 24-core version. Later in 2010 Larrabee will be shrunk for a 32 nanometer fabrication process which will enable a 48 core version. [citeweb|title= Intel is promoting the 32 core CPU "Larrabee"|url=http://pc.watch.impress.co.jp/docs/2007/0611/kaigai364.htm|publisher="pc.watch.impress.co.jp"|accessdate=2008-08-06ja [http://translate.google.com/translate?u=http%3A%2F%2Fpc.watch.impress.co.jp%2Fdocs%2F2007%2F0611%2Fkaigai364.htm&sl=ja&tl=en|English translation] ]
Fudzilla has posted several short articles about Larrabee, claiming that Larrabee may have a TDP as large as 300W, [citeweb|title=Larrabee to launch at 300W TDP|url=http://www.fudzilla.com/index.php?option=com_content&task=view&id=7651&Itemid=1|publisher="fudzilla.com"|accessdate=2008-08-06] that Larrabee will use a 12-layer PCB and has a cooling system that "is meant to look similar to what you can find on high-end Nvidia cards today," [citeweb|title=Larrabee will use a 12-layer PCB|url=http://www.fudzilla.com/index.php?option=com_content&task=view&id=8435&Itemid=1|publisher="fudzilla.com"|accessdate=2008-08-06] that Larrabee will use
GDDR5 memory, and that it is targeted to have 2single-precision teraflops of computing power. [citeweb|title=Larrabee will use GDDR5 memory|url=http://www.fudzilla.com/index.php?option=com_content&task=view&id=8460&Itemid=1|publisher="fudzilla.com"|accessdate=2008-08-06]See also
*
Intel740
*Intel GMA
*x86 architecture
*x86-64
* P5
*List of Intel CPU microarchitectures References
External links
* [http://www.intel.com/pressroom/archive/releases/20080804fact.htm Intel fact sheet about Larrabee]
* [http://softwarecommunity.intel.com/UserFiles/en-us/File/larrabee_manycore.pdf Intel's SIGGRAPH 2008 paper on Larrabee]
* [http://techgage.com/article/intel_opens_up_about_larrabee/ Techgage.com - Discusses how Larrabee differs from normal GPUs, includes block diagram illustration]
* [http://anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367 Intel's Larrabee Architecture Disclosure: A Calculated First Move]
Wikimedia Foundation. 2010.