POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA , announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation , which is the first time for such availability of IBM's highest-end processors.
46-609: Systems based on POWER8 became available from IBM in June 2014. Systems and POWER8 processor designs made by other OpenPOWER members were available in early 2015. POWER8 is designed to be a massively multithreaded chip, with each of its cores capable of handling eight hardware threads simultaneously, for a total of 96 threads executed simultaneously on a 12-core chip. The processor makes use of very large amounts of on- and off-chip eDRAM caches, and on-chip memory controllers enable very high bandwidth to memory and system I/O. For most workloads,
92-481: A computer memory module , using a serial interface. It is typically used during the power-on self-test for automatic configuration of memory modules. Release 4 of the DDR3 Serial Presence Detect (SPD) document (SPD4_01_02_11) adds support for Load Reduction DIMMs and also for 16b-SO-DIMMs and 32b-SO-DIMMs. JEDEC Solid State Technology Association announced the publication of Release 4 of
138-500: A 100–200 MHz I/O clock). High-performance graphics was an initial driver of such bandwidth requirements, where high bandwidth data transfer between framebuffers is required. Because the hertz is a measure of cycles per second, and no signal cycles more often than every other transfer, describing the transfer rate in units of MHz is technically incorrect, although very common. It is also misleading because various memory timings are given in units of clock cycles, which are half
184-590: A 64-byte wide bus (which is twice as wide as on its predecessor), and 8 MB of L3 eDRAM cache per chiplet shareable among all chiplets. Thus, a six-chiplet processor would have 48 MB of L3 eDRAM cache, while a 12-chiplet processor would have a total of 96 MB of L3 eDRAM cache. The chip can also utilize an up to 128 MB of off-chip eDRAM L4 cache using Centaur companion chips. The on-chip memory controllers can handle 1 TB of RAM and 230 GB/s sustained memory bandwidth. The on-board PCI Express controllers can handle 48 GB/s of I/O to other parts of
230-424: A more modern mainstream desktop-oriented part 8 GB, DDR3/1600 DIMM, is rated at 2.58 W, despite being significantly faster. * optional DDR3-xxx denotes data transfer rate, and describes DDR chips, whereas PC3-xxxx denotes theoretical bandwidth (with the last two digits truncated), and is used to describe assembled DIMMs. Bandwidth is calculated by taking transfers per second and multiplying by eight. This
276-676: A second CPU socket are now provided via the X Bus . Besides that and a slight size increase to 659 mm, the differences seem minimal compared to previous POWER8 processors. On 19 January 2014, the Suzhou PowerCore Technology Company announced that they will join the OpenPOWER Foundation and license the POWER8 core to design custom-made processors for use in big data and cloud computing applications. EDRAM Embedded DRAM ( eDRAM )
322-469: A so-called on-chip controller (OCC), which is a power and thermal management microcontroller based on a PowerPC 405 processor. It has two general-purpose offload engines (GPEs) and 512 KB of embedded static RAM (SRAM) (1 KB = 1024 bytes), together with the possibility to access the main memory directly, while running an open-source firmware . OCC manages POWER8's operating frequency, voltage, memory bandwidth, and thermal control for both
368-459: A time. It runs at 8 GB /s in the early Entry models, later increased in the high-end and the HPC models to 9.6 GB/s with a 40-ns latency, for a sustained bandwidth of 24 GB/s and 28.8 GB/s per channel respectively. Each processor has two memory controllers with four memory channels each, and the maximum processor to memory buffer bandwidth is 230.4 GB/s per processor. Depending on
414-503: A total maximum of 16 gigabytes (GB) per DDR3 DIMM. Because of a hardware limitation not fixed until Ivy Bridge-E in 2013, most older Intel CPUs only support up to 4-Gbit chips for 8 GB DIMMs (Intel's Core 2 DDR3 chipsets only support up to 2 Gbit). All AMD CPUs correctly support the full specification for 16 GB DDR3 DIMMs. Intel, also supports 16 GB DIMMs, from Broadwell (also named as "AMD Only memory, because of using 11-bit addressing). In February 2005, Samsung introduced
460-615: A transfer rate of (memory clock rate) × 4 (for bus clock multiplier) × 2 (for data rate) × 64 (number of bits transferred) / 8 (number of bits in a byte). Thus with a memory clock frequency of 100 MHz, DDR3 SDRAM gives a maximum transfer rate of 6400 MB/s . The data rate (in MT/s ) is twice the I/O bus clock (in MHz ) due to the double data rate of DDR memory. As explained above, the bandwidth in MB/s
506-450: Is dynamic random-access memory (DRAM) integrated on the same die or multi-chip module (MCM) of an application-specific integrated circuit (ASIC) or microprocessor . eDRAM's cost-per-bit is higher when compared to equivalent standalone DRAM chips used as external memory, but the performance advantages of placing eDRAM onto the same chip as the processor outweigh the cost disadvantages in many applications. In performance and size, eDRAM
SECTION 10
#1732782636082552-476: Is 4-burst-deep, and the prefetch buffer of DDR is 2-burst-deep. This advantage is an enabling technology in DDR3's transfer speed. DDR3 modules can transfer data at a rate of 800–2133 MT /s using both rising and falling edges of a 400–1066 MHz I/O clock . This is twice DDR2's data transfer rates (400–1066 MT/s using a 200–533 MHz I/O clock) and four times the rate of DDR (200–400 MT/s using
598-512: Is a DRAM interface specification. The actual DRAM arrays that store the data are similar to earlier types, with similar performance. The primary benefit of DDR3 SDRAM over its immediate predecessor DDR2 SDRAM, is its ability to transfer data at twice the rate (eight times the speed of its internal memory arrays), enabling higher bandwidth or peak data rates. The DDR3 standard permits DRAM chip capacities of up to 8 gigabits (Gbit) (so 1 gigabyte by DRAM chip), and up to four ranks of 64 Gbit each for
644-478: Is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth (" double data rate ") interface, and has been in use since 2007. It is the higher-speed successor to DDR and DDR2 and predecessor to DDR4 synchronous dynamic random-access memory (SDRAM) chips. DDR3 SDRAM is neither forward nor backward compatible with any earlier type of random-access memory (RAM) because of different signaling voltages, timings, and other factors. DDR3
690-406: Is because DDR3 memory modules transfer data on a bus that is 64 data bits wide, and since a byte comprises 8 bits, this equates to 8 bytes of data per transfer. With two transfers per cycle of a quadrupled clock signal , a 64- bit wide DDR3 module may achieve a transfer rate of up to 64 times the memory clock speed. With data being transferred 64 bits at a time per memory module, DDR3 SDRAM gives
736-491: Is called Turismo and the dual-chip variant is called Murano. PowerCore's modified version is called CP1. This is a revised version of the original 12-core POWER8 from IBM, and used to be called POWER8+ . The main new feature is that it has support for Nvidia's bus technology NVLink , connecting up to four NVLink devices directly to the chip. IBM removed the A Bus and PCI interfaces for SMP connections to other POWER8 sockets and replaced them with NVLink interfaces. Connection to
782-558: Is eight-way hardware multithreaded and can be dynamically and automatically partitioned to have either one, two, four or all eight threads active. POWER8 also added support for hardware transactional memory . IBM estimates that each core is 1.6 times as fast as the POWER7 in single-threaded operations. A POWER8 processor is a 6- or 12-chiplet design with variants of either 4, 6, 8, 10 or 12 activated chiplets, in which one chiplet consists of one processing core, 512 KB of SRAM L2 cache on
828-869: Is embedded along with the eDRAM memory, the remainder of the ASIC can treat the memory like a simple SRAM type such as in 1T-SRAM . eDRAM is used in various products, including IBM 's POWER7 processor, and IBM's z15 mainframe processor (mainframes built which use up to 4.69 GB of eDRAM when 5 such add-on chips/drawers are used but all other levels from L1 up also use eDRAM, for a total of 6.4 GB of eDRAM). Intel 's Haswell CPUs with GT3e integrated graphics, many game consoles and other devices, such as Sony 's PlayStation 2 , Sony's PlayStation Portable , Nintendo 's GameCube , Nintendo's Wii , Nintendo's Wii U , and Microsoft's Xbox 360 also use eDRAM. High Bandwidth Memory DDR3 Double Data Rate 3 Synchronous Dynamic Random-Access Memory ( DDR3 SDRAM )
874-499: Is not directly caused by the change to DDR3. CAS latency (ns) = 1000 × CL (cycles) ÷ clock frequency (MHz) = 2000 × CL (cycles) ÷ transfer rate (MT/s) While the typical latencies for a JEDEC DDR2-800 device were 5-5-5-15 (12.5 ns), some standard latencies for JEDEC DDR3 devices include 7-7-7-20 for DDR3-1066 (13.125 ns) and 8-8-8-24 for DDR3-1333 (12 ns). As with earlier memory generations, faster DDR3 memory became available after
920-441: Is positioned between level 3 cache and conventional DRAM on the memory bus, and effectively functions as a level 4 cache, though architectural descriptions may not explicitly refer to it in those terms. Embedding memory on the ASIC or processor allows for much wider buses and higher operation speeds, and due to much higher density of DRAM in comparison to SRAM , larger amounts of memory can be installed on smaller chips if eDRAM
966-434: Is the data rate multiplied by eight. CL – CAS Latency clock cycles , between sending a column address to the memory and the beginning of the data in response tRCD – Clock cycles between row activate and reads/writes tRP – Clock cycles between row precharge and activate Fractional frequencies are normally rounded down, but rounding up to 667 is common because of the exact number being 666 2 ⁄ 3 and rounding to
SECTION 20
#17327826360821012-400: Is used instead of eSRAM . eDRAM requires additional fab process steps compared with embedded SRAM, which raises cost, but the 3× area savings of eDRAM memory offsets the process cost when a significant amount of memory is used in the design. eDRAM memories, like all DRAM memories, require periodic refreshing of the memory cells, which adds complexity. However, if the memory refresh controller
1058-554: The GX++ bus for external communication, POWER8 removes this from the design and replaces it with the CAPI port (Coherent Accelerator Processor Interface) that is layered on top of PCI Express 3.0 . The CAPI port is used to connect auxiliary specialized processors such as GPUs , ASICs and FPGAs . Units attached to the CAPI bus can use the same memory address space as the CPU, thereby reducing
1104-589: The DDR3 Serial Presence Detect (SPD) document on September 1, 2011. Intel Corporation officially introduced the eXtreme Memory Profile ( XMP ) Specification on March 23, 2007, to enable enthusiast performance extensions to the traditional JEDEC SPD specifications for DDR3 SDRAM. In addition to bandwidth designations (e.g. DDR3-800D), and capacity variants, modules can be one of the following: Both FBDIMM (fully buffered) and LRDIMM (load reduced) memory types are designed primarily to control
1150-532: The Memory Buffer chips and the DRAM banks. Initially support was limited to 16 GB, 32 GB and 64 GB DIMMs, allowing up to 1 TB to be addressed by the processor. Later support for 128 GB and 256 GB DIMMs was announced, allowing up to 4 TB per processor. The POWER8 core has 64 KB L1 data cache contained in the load-store unit and 32 KB L1 instruction cache contained in
1196-419: The absolute maximum when memory stability is the foremost consideration, such as in servers or other mission-critical devices. In addition, JEDEC states that memory modules must withstand up to 1.80 volts before incurring permanent damage, although they are not required to function correctly at that level. Another benefit is its prefetch buffer , which is 8-burst-deep. In contrast, the prefetch buffer of DDR2
1242-499: The amount of electric current flowing to and from the memory chips at any given time. They are not compatible with registered/buffered memory, and motherboards that require them usually will not accept any other kind of memory. The DDR3L ( DDR3 L ow Voltage) standard is an addendum to the JESD79-3 DDR3 Memory Device Standard specifying low voltage devices. The DDR3L standard is 1.35 V and has
1288-606: The chip is said to perform two to three times as fast as its predecessor, the POWER7 . POWER8 chips comes in 6- or 12-core variants; each version is fabricated in a 22 nm silicon on insulator (SOI) process using 15 metal layers. The 12-core version consists of 4.2 billion transistors and is 650 mm large while the 6-core version is only 362 mm large. However the 6- and 12-core variants can have all or just some cores active, so POWER8 processors come with 4, 6, 8, 10 or 12 cores activated. Where previous POWER processors use
1334-546: The computing path length. At the 2013 ACM/IEEE Supercomputing Conference , IBM and Nvidia announced an engineering partnership to closely couple POWER8 with Nvidia GPUs in future HPC systems, with the first of them announced as the Power Systems S824L. On October 14, 2016, IBM announced the formation of OpenCAPI , a new organization to spread adoption of CAPI to other platforms. Initial members are Google, AMD, Xilinx, Micron and Mellanox. POWER8 also contains
1380-516: The early part of their roll-out in August 2008. (The same timescale for market penetration had been stated by market intelligence company DRAMeXchange over a year earlier in April 2007, and by Desi Rhoden in 2005. ) The primary driving force behind the increased usage of DDR3 has been new Core i7 processors from Intel and Phenom II processors from AMD, both of which have internal memory controllers:
1426-490: The first prototype DDR3 memory chip. Samsung played a major role in the development and standardisation of DDR3. In May 2005, Desi Rhoden, chairman of the JEDEC committee, stated that DDR3 had been under development for "about 3 years". DDR3 was officially launched in 2007, but sales were not expected to overtake DDR2 until the end of 2009 or possibly early 2010, according to Intel strategist Carlos Weissenberg, speaking during
POWER8 - Misplaced Pages Continue
1472-689: The former requires DDR3, the latter recommends it. IDC stated in January 2009 that DDR3 sales would account for 29% of the total DRAM units sold in 2009, rising to 72% by 2011. In September 2012, JEDEC released the final specification of DDR4. The primary benefits of DDR4 compared to DDR3 include a higher standardized range of clock frequencies and data transfer rates and significantly lower voltage . Compared to DDR2 memory, DDR3 memory uses less power. Some manufacturers further propose using "dual-gate" transistors to reduce leakage of current. According to JEDEC , 1.575 volts should be considered
1518-399: The highest speeds would reach up to DDR3-3200. Alternative naming: DDR3 modules are often incorrectly labeled with the prefix PC (instead of PC3), for marketing reasons, followed by the data-rate. Under this convention PC3-10600 is listed as PC1333. DDR3 memory utilizes serial presence detect . Serial presence detect (SPD) is a standardized way to automatically access information about
1564-517: The instruction fetch unit, along with a tightly integrated 512 KB L2 cache. In a single cycle each core can fetch up to eight instructions, decode and dispatch up to eight instructions, issue and execute up to ten instructions and commit up to eight instructions. Each POWER8 core consist of primarily the following six execution units : Each core has sixteen execution pipelines: It has a larger issue queue with 4×16 entries, improved branch predictors and can handle twice as many cache misses. Each core
1610-520: The label PC3L for its modules. Examples include DDR3L‐800 (PC3L-6400), DDR3L‐1066 (PC3L-8500), DDR3L‐1333 (PC3L-10600), and DDR3L‐1600 (PC3L-12800). Memory specified to DDR3L and DDR3U specifications is compatible with the original DDR3 standard, and can run at either the lower voltage or at 1.50 V. However, devices that require DDR3L explicitly, which operate at 1.35 V, such as systems using mobile versions of fourth-generation Intel Core processors, are not compatible with 1.50 V DDR3 memory. DDR3L
1656-886: The market in June 2007 based on Intel 's P35 "Bearlake" chipset with DIMMs at bandwidths up to DDR3-1600 (PC3-12800). The Intel Core i7 , released in November 2008, connects directly to memory rather than via a chipset. The Core i7, i5 & i3 CPUs initially supported only DDR3. AMD 's socket AM3 Phenom II X4 processors, released in February 2009, were their first to support DDR3 (while still supporting DDR2 for backwards compatibility). DDR3 dual-inline memory modules (DIMMs) have 240 pins and are electrically incompatible with DDR2. A key notch—located differently in DDR2 and DDR3 DIMMs—prevents accidentally interchanging them. Not only are they keyed differently, but DDR2 has rounded notches on
1702-440: The model only one controller might be enabled, or only two channels per controller could be in use. For increased availability the link provides "on-the-fly" lane isolation and repair. Each Memory Buffer chip has four interfaces allowing to use either DDR3 or DDR4 memory at 1600 MHz with no change to the processor link interface. The resulting 32 memory channels per processor allow peak access rate of 409.6 GB/s between
1748-475: The nearest whole number. Some manufacturers also round to a certain precision or round up instead. For example, PC3-10666 memory could be listed as PC3-10600 or PC3-10700. Note: All items listed above are specified by JEDEC as JESD79-3F. All RAM data rates in-between or above these listed specifications are not standardized by JEDEC—often they are simply manufacturer optimizations using higher-tolerance or overvolted chips. Of these non-standard specifications,
1794-567: The processor and closer to the memory. The scheduling logic, the memory energy management, and the RAS decision point are moved to a so-called Memory Buffer chip (a.k.a. Centaur ). Offloading certain memory processes to the Memory Buffer chip enables memory access optimizations, saving bandwidth and allowing for faster processor to memory communication. It also contains caching structures for an additional 16 MB of L4 cache per chip (up to 128 MB per processor) (1 MB = 1024 KB). Depending on
1840-528: The processor and memory; it can regulate voltages through 1,764 integrated voltage regulators (IVRs) on the fly. Also, the OCC can be programmed to overclock the POWER8 processor, or to lower its power consumption by reducing the operating frequency (which is similar to the configurable TDP found in some of the Intel and AMD processors). POWER8 splits the memory controller functions by moving some of them away from
1886-660: The release of the initial versions. DDR3-2000 memory with 9-9-9-28 latency (9 ns) was available in time to coincide with the Intel Core i7 release in late 2008, while later developments made DDR3-2400 widely available (with CL 9–12 cycles = 7.5–10 ns), and speeds up to DDR3-3200 available (with CL 13 cycles = 8.125 ns). Power consumption of individual SDRAM chips (or, by extension, DIMMs) varies based on many factors, including speed, type of usage, voltage, etc. Dell's Power Advisor calculates that 4 GB ECC DDR1333 RDIMMs use about 4 W each. By contrast,
POWER8 - Misplaced Pages Continue
1932-458: The same dimensions and number of pins as regular DDR4 SO-DIMMs, but the notch is placed differently to avoid accidentally using in an incompatible DDR4 SO-DIMM socket. DDR3 latencies are numerically higher because the I/O bus clock cycles by which they are measured are shorter; the actual time interval is similar to DDR2 latencies, around 10 ns. There is some improvement because DDR3 generally uses more recent manufacturing processes, but this
1978-512: The side and the DDR3 modules have square notches on the side. DDR3 SO-DIMMs have 204 pins. For the Skylake microarchitecture , Intel has also designed a SO-DIMM package named UniDIMM , which can use either DDR3 or DDR4 chips. The CPU's integrated memory controller can then work with either. The purpose of UniDIMMs is to handle the transition from DDR3 to DDR4, where pricing and availability may make it desirable to switch RAM type. UniDIMMs have
2024-407: The speed of data transfers. DDR3 does use the same electric signaling standard as DDR and DDR2, Stub Series Terminated Logic , albeit at different timings and voltages. Specifically, DDR3 uses SSTL_15. In February 2005, Samsung demonstrated the first DDR3 memory prototype, with a capacity of 512 Mb and a bandwidth of 1.066 Gbps . Products in the form of motherboards appeared on
2070-529: The system architecture the Memory Buffer chips are placed either on the memory modules (Custom DIMM/CDIMM, for example in S824 and E880 models), or on the memory riser card holding standard DIMMs (for example in S822LC models). The Memory Buffer chip is connected to the processor using a high-speed multi-lane serial link. The memory channel connecting each buffer chip is capable of writing 2 bytes and reading 1 byte at
2116-464: The system. The cores are designed to operate at clock rates between 2.5 and 5 GHz. The six-core chips are mounted in pairs on dual-chip modules (DCM) in IBM's scale out servers . In most configurations not all cores are active, resulting in a variety of configurations where the actual core count differs. The 12-core version is used in the high-end E880 and E880C models. IBM's single-chip POWER8 module
#81918