Chip Multiprocessor Watch


Multiprocessor Architecture Watch
Concurrency Abstraction Watch

Chip Multiprocessor Watch

Just a few years ago, the idea of putting multiple processors on a chip was farfetched. Now it is accepted and commonplace, and virtually every new high performance processor is a chip multiprocessor of some sort. This webpage exists as a starting point for us to organize our understanding of the landscape of chip multiprocessors.

Domain Specific Multiprocessors

Sony/Toshiba/IBM Cell Processor

A joint project of Sony, Toshiba, IBM, Cell is envisioned primarily as a computing engine for media applications. It will be the centerpiece of Sony's Playstation 3, and should ship in 2006.

Main Attributes

  • One dual threaded, dual issue in-order PowerPC core @3.2 GHz
  • 8 Processing Elements, each with 256k local store, a vector Single Precision FPU and a conventional Double Precision FPU @3.2 GHz
  • Bi-directional ring interconnect between all 9 PEs
  • Rambus XDR memory controller


IBM/Microsoft Xenon

An IBM designed processor, customized for Microsoft, Xenon is the CPU of the Xbox 360, and shipped to consumers in November 2005.

Main Attributes

  • 3 dual threaded, dual issue in-order PowerPC cores @3.2 GHz
  • 1 MB shared L2 memory


ClearSpeed CSX600

Clearspeed has developed a highly parallel architecture for High Performance computing work, based around an array of 64-96 "poly" Processing Elements for arithmetic computation, as well as an 8 threaded "mono" processing unit designed for control tasks. Clearspeed's processing elements each contain:

  • Double Precision FP Adder and Multiplier
  • Integer ALU including MAC
  • 6 KB SRAM & 128 Byte register file
  • Next-neighbor connections to other Processing Elements

Interestingly, Clearspeed's architecture requires that the same instruction stream pass through each poly processing element, which in some ways blurs the definition of computing "core". However, each processing element can enable and disable changes to its state by pushing and popping predicate bits from a control stack, so the processing elements are capable of branching in a limited sense.

The Clearspeed architecture also contains memory controllers and external interfaces to form a complete system-on-chip.


Cisco CRS-1 Metro

Cisco takes the idea of "processor as the NAND gate of the future" to an advanced level by using a massively many-core custom network processor in its highest end routers. These routers contain 192 customized Tensilica processors, each of which contains small instruction and data caches, a customized DMA engine which allows up to three outstanding DMA requests at any given time, and "Tens of KBs" of local instruction memory.


Intel IXP

Intel has developed a series of network processors, the most advanced of which is the IXP2800 processor. The IXP2800 processor contains 16 multithreaded microengines specialized for networking dataplane operations, an XScale processor for controlplane operations, a Hash unit and a Crypto unit, a Scratchpad memory (16 KB) as well as network interfaces, 4 SRAM and 3 RDRAM memory controllers.

Each microengine operates at up to 1.4 GHz, and contains:

  • A 128 entry register file
  • An integer ALU with limited support for multiplication
  • A hash unit
  • Local memory (640 words)
  • A small CAM
  • 128 entry next neighbor registers to communicate with neighboring microengines
  • Instruction memory (8KB)


General Purpose Multiprocessors

Sun UltraSparc T1 - Niagara

Sun's Niagara architecture is adopted from Afara Websystems Inc, a startup that pioneered the development of throughput-oriented microprocessor technology optimized for commercial server applications. The 90nm Niagara chip integrates eight cores onto one die, where each core has one pipeline that can support four threads simultaneously with zero context-switch overhead. It symbolizes a shift in the server microprocessor design paradigm towards Fine Grained Chip Multi-threading.

The chip is marketed as UltraSPARC T1 in the Sun Fire CoolThreads T1000 and T2000 servers. The next generation Niagara2 architecture is due in 2007 in 65nm. With eight cores, two pipelines per core, supporting eight threads per core, is expected to double the performance of Niagara.



IBM Power5 architecture is a dual core architecture that first debuted in 2003 in 0.13um at 2GHz. It is binary and structural compatible with its predecessor Power4, and is scalable to a 64 physical processors, 128 core systems. User can choose from a variety of packaging for the Power5 dual core chip, ranging from the 4 core DualChipModule or DCMs (two dual core chips on one package) to 8 cores MultiChipModule or MCMs (four chips on one package, see picture) For high-end server systems, the large 95mm x 95mm MCM contain four dual core chips1GHz inter-chip buses, and 144MB of L3 cache (36MB for each core).



P.A. Semi's PWRficient family of 64-bit multicore processors is based on the IBM Power architecture. It uses 5-13 watts of power while operating at 2GHz in 0.65um. PWRficient archetecture features two DDR2 memory controllers, 2MB of L2 cache, and a flexible I/O subsystem for computing and embedded applications. The I/O subsystem provides 24 configurable serdes lanes for high-speed serial I/O. It may used for Express, XAUI, or SGMII interconnect in a wide range of configurations. The 1682M includes 8 PCI Express engines, supporting link widths of 1, 2, 4, 8, and 16 lanes for general peripheral connection, with up to 4GB/s bandwidth per engine. The two XAUI (10 Gigabit Ethernet) and four SGMII (10/100/1 Gigabit Ethernet) protocol engines each feature packet processing, including line-rate packet filtering, VLAN flow control, and TCP/IP acceleration.


Multiprocessor Articles and News to Watch

Send feedback to
©2002-2018 U.C. Regents