Xeon Phi - Misplaced Pages

#927072

74-426: Xeon Phi is a discontinued series of x86 manycore processors designed and made by Intel . It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP . Xeon Phi launched in 2010. Since it was originally based on an earlier GPU design ( codenamed "Larrabee" ) by Intel that

148-618: A 64 KB (one segment) stack in memory supported by computer hardware . Only words (two bytes) can be pushed to the stack. The stack grows toward numerically lower addresses, with SS:SP pointing to the most recently pushed item. There are 256 interrupts , which can be invoked by both hardware and software. The interrupts can cascade, using the stack to store the return address . The original Intel 8086 and 8088 have fourteen 16- bit registers. Four of them (AX, BX, CX, DX) are general-purpose registers (GPRs), although each may have an additional purpose; for example, only CX can be used as

222-579: A backward compatible version of this functionality on the same microprocessor as the main processor. In addition to this, modern x86 designs also contain a SIMD -unit (see SSE below) where instructions can work in parallel on (one or two) 128-bit words, each containing two or four floating-point numbers (each 64 or 32 bits wide respectively), or alternatively, 2, 4, 8 or 16 integers (each 64, 32, 16 or 8 bits wide respectively). The presence of wide SIMD registers means that existing x86 processors can load or store up to 128 bits of memory data in

296-403: A compatible design) and the scalability of x86 chips in the form of modern multi-core CPUs, is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from completely new architectures. The table below lists processor models and model series implementing various architectures in the x86 family, in chronological order. Each line item

370-539: A counter with the loop instruction. Each can be accessed as two separate bytes (thus BX's high byte can be accessed as BH and low byte as BL). Two pointer registers have special roles: SP (stack pointer) points to the "top" of the stack , and BP (base pointer) is often used to point at some other place in the stack, typically above the local variables (see frame pointer ). The registers SI, DI, BX and BP are address registers , and may also be used for array indexing. One of four possible 'segment registers' (CS, DS, SS and ES)

444-476: A major change to the architecture referred to as X86S (formerly known as X86-S). The S in X86S stands for "simplification", which aims to remove support for legacy execution modes and instructions. A processor implementing this proposal would start execution directly in long mode and would only support 64-bit operating systems. 32-bit code would only be supported for user applications running in ring 3, and would use

518-547: A memory location. However, this memory operand may also be the destination (or a combined source and destination), while the other operand, the source, can be either register or immediate. Among other factors, this contributes to a code size that rivals eight-bit machines and enables efficient use of instruction cache memory. The relatively small number of general registers (also inherited from its 8-bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on

592-560: A more complex micro-op which fits the execution model better and thus can be executed faster or with fewer machine resources involved. Another way to try to improve performance is to cache the decoded micro-operations, so the processor can directly access the decoded micro-operations from a special cache, instead of decoding them again. Intel followed this approach with the Execution Trace Cache feature in their NetBurst microarchitecture (for Pentium 4 processors) and later in

666-623: A power requirement of ~300 W, built at a 45 nm process. In the Aubrey Isle core a 1,024-bit ring bus (512-bit bi-directional) connects processors to main memory. Single-board performance has exceeded 750 GFLOPS. The prototype boards only support single-precision floating-point instructions. Initial developers included CERN , Korea Institute of Science and Technology Information (KISTI) and Leibniz Supercomputing Centre . Hardware vendors for prototype boards included IBM, SGI, HP, Dell and others. The Knights Corner product line

740-568: A processor codenamed Aubrey Isle was announced 31 May 2010. The product was stated to be a derivative of the Larrabee project and other Intel research including the Single-chip Cloud Computer . The development product was offered as a PCIe card with 32 in-order cores at up to 1.2 GHz with four threads per core, 2 GB GDDR5 memory, and 8 MB coherent L2 cache (256 KB per core with 32 KB L1 cache), and

814-496: A response to the successful 8080-compatible Zilog Z80 , the x86 line soon grew in features and processing power. Today, x86 is ubiquitous in both stationary and portable personal computers, and is also used in midrange computers , workstations , servers, and most new supercomputer clusters of the TOP500 list. A large amount of software , including a large list of x86 operating systems are using x86-based hardware. Modern x86

SECTION 10

#1732783601928

888-440: A single chip with multiple independent cores: the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and incorporated a mesh network for inter-chip messaging. The design lacked cache-coherent cores and focused on principles that would allow the design to scale to many more cores. The Teraflops Research Chip (prototype unveiled 2007)

962-670: A single instruction and also perform bitwise operations (although not integer arithmetic ) on full 128-bits quantities in parallel. Intel's Sandy Bridge processors added the Advanced Vector Extensions (AVX) instructions, widening the SIMD registers to 256 bits. The Intel Initial Many Core Instructions implemented by the Knights Corner Xeon Phi processors, and the AVX-512 instructions implemented by

1036-497: A solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186 , 80286 , 80386 and 80486 . Colloquially, their names were "186", "286", "386" and "486". The term is not synonymous with IBM PC compatibility , as this implies a multitude of other computer hardware . Embedded systems and general-purpose computers used x86 chips before

1110-700: A standalone CPU, rather than just as an add-in card. In June 2013, the Tianhe-2 supercomputer at the National Supercomputer Center in Guangzhou (NSCC-GZ) was announced as the world's fastest supercomputer (as of June 2023, it is No. 10). It used Intel Xeon Phi coprocessors and Ivy Bridge -EP Xeon E5 v2 processors to achieve 33.86 petaFLOPS. The Xeon Phi product line directly competed with Nvidia 's Tesla and AMD Radeon Instinct lines of deep learning and GPGPU cards. It

1184-891: A transparent processor extension, allowing legacy MMX / SSE code to run without code changes. An important component of the Intel Xeon Phi coprocessor's core is its vector processing unit (VPU). The VPU features a novel 512-bit SIMD instruction set, officially known as Intel Initial Many Core Instructions (Intel IMCI). Thus, the VPU can execute 16 single-precision (SP) or 8 double-precision (DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers. The VPU also features an Extended Math Unit (EMU) that can execute operations such as reciprocal, square root, and logarithm, thereby allowing these operations to be executed in

1258-818: A vector fashion with high bandwidth. The EMU operates by calculating polynomial approximations of these functions. On 12 November 2012, Intel announced two Xeon Phi coprocessor families using the 22 nm process size: the Xeon Phi 3100 and the Xeon Phi 5110P. The Xeon Phi 3100 will be capable of more than 1 teraFLOPS of double-precision floating-point instructions with 240 GB/s memory bandwidth at 300 W. The Xeon Phi 5110P will be capable of 1.01 teraFLOPS of double-precision floating-point instructions with 320 GB/s memory bandwidth at 225 W. The Xeon Phi 7120P will be capable of 1.2 teraFLOPS of double-precision floating-point instructions with 352 GB/s memory bandwidth at 300 W. On 17 June 2013,

1332-801: A version of the Hybrid Memory Cube . Each core has two 512-bit vector units and supports AVX-512 SIMD instructions, specifically the Intel AVX-512 Foundational Instructions (AVX-512F) with Intel AVX-512 Conflict Detection Instructions (AVX-512CD), Intel AVX-512 Exponential and Reciprocal Instructions (AVX-512ER), and Intel AVX-512 Prefetch Instructions (AVX-512PF). Support for IMCI has been removed in favor of AVX-512. The National Energy Research Scientific Computing Center announced that Phase 2 of its newest supercomputing system "Cori" would use Knights Landing Xeon Phi coprocessors. On 20 June 2016, Intel launched

1406-462: Is Intel's codename for a Xeon Phi product specialized in deep learning , initially released in December 2017. Nearly identical in specifications to Knights Landing, Knights Mill includes optimizations for better utilization of AVX-512 instructions. Single-precision and variable-precision floating-point performance increased, at the expense of double-precision floating-point performance. Knights Hill

1480-470: Is allowed for almost all instructions. The largest native size for integer arithmetic and memory addresses (or offsets ) is 16, 32 or 64 bits depending on architecture generation (newer processors include direct support for smaller integers as well). Multiple scalar values can be handled simultaneously via the SIMD unit present in later generations, as described below. Immediate addressing offsets and immediate data may be expressed as 8-bit quantities for

1554-403: Is an experimental 80-core chip with two floating-point units per core, implementing a 96-bit VLIW architecture instead of the x86 architecture. The project investigated intercore communication methods, per-chip power management, and achieved 1.01 TFLOPS at 3.16 GHz consuming 62 W of power. Intel's Many Integrated Core (MIC) prototype board, named Knights Ferry , incorporating

SECTION 20

#1732783601928

1628-783: Is available from Intel under the extension name of KNC. Code name for the second-generation MIC architecture product from Intel. Intel officially first revealed details of its second-generation Intel Xeon Phi products on 17 June 2013. Intel said that the next generation of Intel MIC Architecture-based products will be available in two forms, as a coprocessor or a host processor (CPU), and be manufactured using Intel's 14 nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth. Knights Landing contains up to 72 Airmont (Atom) cores with four threads per core, using LGA 3647 socket supporting up to 384 GB of "far" DDR4 2133 RAM and 8–16 GB of stacked "near" 3D MCDRAM ,

1702-691: Is characterized by significantly improved or commercially successful processor microarchitecture designs. At various times, companies such as IBM , VIA , NEC , AMD , TI , STM , Fujitsu , OKI , Siemens , Cyrix , Intersil , C&T , NexGen , UMC , and DM&P started to design or manufacture x86 processors (CPUs) intended for personal computers and embedded systems. Other companies that designed or manufactured x86 or x87 processors include ITT Corporation , National Semiconductor , ULSI System Technology, and Weitek . Such x86 implementations were seldom simple copies but often employed different internal microarchitectures and different solutions at

1776-705: Is made at a 22 nm process size, using Intel's Tri-gate technology with more than 50 cores per chip, and is Intel's first many-cores commercial product. In June 2011, SGI announced a partnership with Intel to use the MIC architecture in its high-performance computing products. In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will use Knights Corner cards in their 10-petaFLOPS "Stampede" supercomputer, providing 8 petaFLOPS of compute power. According to "Stampede: A Comprehensive Petascale Computing Environment"

1850-463: Is one of the two modes only available in long mode . The addressing modes were not dramatically changed from 32-bit mode, except that addressing was extended to 64 bits, virtual addresses are now sign extended to 64 bits (in order to disallow mode bits in virtual addresses), and other selector details were dramatically reduced. In addition, an addressing mode was added to allow memory references relative to RIP (the instruction pointer ), to ease

1924-587: Is relatively uncommon in embedded systems , however, and small low power applications (using tiny batteries), and low-cost microprocessor markets, such as home appliances and toys, lack significant x86 presence. Simple 8- and 16-bit based architectures are common here, as well as simpler RISC architectures like RISC-V , although the x86-compatible VIA C7 , VIA Nano , AMD 's Geode , Athlon Neo and Intel Atom are examples of 32- and 64-bit designs used in some relatively low-power and low-cost segments. There have been several attempts, including by Intel, to end

1998-736: Is to leverage x86 legacy by creating an x86-compatible multiprocessor architecture that can use existing parallelization software tools. Programming tools include OpenMP , OpenCL , Cilk / Cilk Plus and specialised versions of Intel's Fortran, C++ and math libraries. Design elements inherited from the Larrabee project include x86 ISA, 4-way SMT per core, 512-bit SIMD units, 32 KB L1 instruction cache, 32 KB L1 data cache, coherent L2 cache (512 KB per core), and ultra-wide ring bus connecting processors and memory. The Knights Corner 512-bit SIMD instructions share many intrinsic functions with AVX-512 extension . The instruction set documentation

2072-491: Is used to form a memory address. In the original 8086 / 8088 / 80186 / 80188 every address was built from a segment register and one of the general purpose registers. For example ds:si is the notation for an address formed as [16 * ds + si] to allow 20-bit addressing rather than 16 bits, although this changed in later processors. At that time only certain combinations were supported. The FLAGS register contains flags such as carry flag , overflow flag and zero flag . Finally,

2146-458: The fstsw instruction, and it is common to simply use some of its bits for branching by copying it into the normal FLAGS. In the Intel 80286 , to support protected mode , three special registers hold descriptor table addresses (GDTR, LDTR, IDTR ), and a fourth task register (TR) is used for task switching. The 80287 is the floating-point coprocessor for the 80286 and has the same registers as

2220-525: The 6x86 was significantly faster than the Pentium on integer code. AMD later managed to grow into a serious contender with the K6 set of processors, which gave way to the very successful Athlon and Opteron . There were also other contenders, such as Centaur Technology (formerly IDT ), Rise Technology , and Transmeta . VIA Technologies ' energy efficient C3 and C7 processors, which were designed by

2294-496: The 80486 and all subsequent x86 models, the floating-point processing unit (FPU) is integrated on-chip. The Pentium MMX added eight 64-bit MMX integer vector registers (MM0 to MM7, which share lower bits with the 80-bit-wide FPU stack). With the Pentium III , Intel added a 32-bit Streaming SIMD Extensions (SSE) control/status register (MXCSR) and eight 128-bit SSE floating-point registers (XMM0 to XMM7). Starting with

Xeon Phi - Misplaced Pages Continue

2368-406: The 8088 and 80286 were still in common use, the term x86 usually represented any 8086-compatible CPU. Today, however, x86 usually implies binary compatibility with the 32-bit instruction set of the 80386 . This is due to the fact that this instruction set has become something of a lowest common denominator for many modern operating systems and also probably because the term became common after

2442-573: The AMD Opteron processor, the x86 architecture extended the 32-bit registers into 64-bit registers in a way similar to how the 16 to 32-bit extension took place. An R -prefix (for "register") identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP), and eight additional 64-bit general registers (R8–R15) were also introduced in the creation of x86-64 . Also, eight more SSE vector registers (XMM8–XMM15) were added. However, these extensions are only usable in 64-bit mode, which

2516-653: The Centaur company, were sold for many years following their release in 2005. Centaur's 2008 design, the VIA Nano , was their first processor with superscalar and speculative execution . It was introduced at about the same time (in 2008) as Intel introduced the Intel Atom , its first "in-order" processor after the P5 Pentium . Many additions and extensions have been added to the original x86 instruction set over

2590-516: The Tianhe-2 supercomputer was announced by TOP500 as the world's fastest. Tianhe-2 used Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 petaFLOPS. It was the fastest on the list for two and a half years, lastly in November 2015. The cores of Knights Corner are based on a modified version of P54C design, used in the original Pentium. The basis of the Intel MIC architecture

2664-461: The machine code format was expanded. To provide backward compatibility, segments with executable code can be marked as containing either 16-bit or 32-bit instructions. Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment or vice versa. The 80386 had an optional floating-point coprocessor, the 80387 ; it had eight 80-bit wide registers: st(0) to st(7), like the 8087 and 80287. The 80386 could also use an 80287 coprocessor. With

2738-402: The "second-generation Intel (Knights Landing) MICs will be added when they become available, increasing Stampede's aggregate peak performance to at least 15 PetaFLOPS." On 15 November 2011, Intel showed an early silicon version of a Knights Corner processor. On 5 June 2012, Intel released open source software and documentation regarding Knights Corner. On 18 June 2012, Intel announced at

2812-405: The 2012 Hamburg International Supercomputing Conference that Xeon Phi will be the brand name used for all products based on their Many Integrated Core architecture. In June 2012, Cray announced it would be offering 22 nm 'Knight's Corner' chips (branded as 'Xeon Phi') as a co-processor in its 'Cascade' systems. In June 2012, ScaleMP announced a virtualization update allowing Xeon Phi as

2886-447: The 7220A, 7240P and 7220P coprocessor cards. Intel announced they were discontinuing Knights Landing in summer 2018. All models can boost to their peak speeds, adding 200 MHz to their base frequency when running just one or two cores. When running from three to the maximum number of cores, the chips can only boost 100 MHz above the base frequency. All chips run high-AVX code at a frequency reduced by 200 MHz. Knights Mill

2960-471: The 8087 with the same data formats. With the advent of the 32-bit 80386 processor, the 16-bit general-purpose registers, base registers, index registers, instruction pointer, and FLAGS register , but not the segment registers, were expanded to 32 bits. The nomenclature represented this by prefixing an " E " (for "extended") to the register names in x86 assembly language . Thus, the AX register corresponds to

3034-634: The Decoded Stream Buffer (for Core-branded processors since Sandy Bridge). Transmeta used a completely different method in their Crusoe x86 compatible CPUs. They used just-in-time translation to convert x86 instructions to the CPU's native VLIW instruction set. Transmeta argued that their approach allows for more power efficient designs since the CPU can forgo the complicated decode step of more traditional x86 implementations. Addressing modes for 16-bit processor modes can be summarized by

Xeon Phi - Misplaced Pages Continue

3108-476: The Intel Xeon Phi product family x200 based on the Knights Landing architecture, stressing its applicability to not just traditional simulation workloads, but also to machine learning . The model lineup announced at launch included only Xeon Phi of bootable form-factor, but two versions of it: standard processors and processors with integrated Intel Omni-Path architecture fabric. The latter is denoted by

3182-877: The Knights Landing Xeon Phi processors and by Skylake-X processors, use 512-bit wide SIMD registers. During execution , current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces called micro-operations. These are then handed to a control unit that buffers and schedules them in compliance with x86-semantics so that they can be executed, partly in parallel, by one of several (more or less specialized) execution units . These modern x86 designs are thus pipelined , superscalar , and also capable of out of order and speculative execution (via branch prediction , register renaming , and memory dependence prediction ), which means they may execute multiple (partial or complete) x86 instructions simultaneously, and not necessarily in

3256-476: The Larrabee chips also included specialised hardware for texture sampling. The project to produce a retail GPU product directly from the Larrabee research project was terminated in May 2010. Another contemporary Intel research project implementing x86 architecture on a many-multicore processor was the ' Single-chip Cloud Computer ' (prototype introduced 2009), a design mimicking a cloud computing computer datacentre on

3330-540: The PC-compatible market started , some of them before the IBM PC (1981) debut. As of June 2022 , most desktop and laptop computers sold are based on the x86 architecture family, while mobile categories such as smartphones or tablets are dominated by ARM . At the high end, x86 continues to dominate computation-intensive workstation and cloud computing segments. In the 1980s and early 1990s, when

3404-434: The advanced but delayed 5k86 ( K5 ), which, internally, was closely based on AMD's earlier 29K RISC design; similar to NexGen 's Nx586 , it used a strategy such that dedicated pipeline stages decode x86 instructions into uniform and easily handled micro-operations , a method that has remained the basis for most x86 designs to this day. Some early versions of these microprocessors had heat dissipation problems. The 6x86

3478-415: The art, had been planned for 2021; as of March 2022 the release had not taken place, however. The instruction set architecture has twice been extended to a larger word size. In 1985, Intel released the 32-bit 80386 (later known as i386) which gradually replaced the earlier 16-bit chips in computers (although typically not in embedded systems ) during the following years; this extended programming model

3552-569: The electronic and physical levels. Quite naturally, early compatible microprocessors were 16-bit, while 32-bit designs were developed much later. For the personal computer market, real quantities started to appear around 1990 with i386 and i486 compatible processors, often named similarly to Intel's original chips. After the fully pipelined i486 , in 1993 Intel introduced the Pentium brand name (which, unlike numbers, could be trademarked ) for their new set of superscalar x86 designs. With

3626-401: The execution units with the decode steps opens up possibilities for more analysis of the (buffered) code stream, and therefore permits detection of operations that can be performed in parallel, simultaneously feeding more than one execution unit. The latest processors also do the opposite when appropriate; they combine certain x86 sequences (such as a compare followed by a conditional jump) into

3700-549: The first two actively produce modern 64-bit designs, leading to what has been called a "duopoly" of Intel and AMD in x86 processors. However, in 2014 the Shanghai-based Chinese company Zhaoxin , a joint venture between a Chinese company and VIA Technologies, began designing VIA based x86 processors for desktops and laptops. The release of its newest "7" family of x86 processors (e.g. KX-7000), which are not quite as fast as AMD or Intel chips but are still state of

3774-528: The formula: Addressing modes for 32-bit x86 processor modes can be summarized by the formula: Addressing modes for the 64-bit processor mode can be summarized by the formula: Instruction relative addressing in 64-bit code (RIP + displacement, where RIP is the instruction pointer register ) simplifies the implementation of position-independent code (as used in shared libraries in some operating systems). The 8086 had 64 KB of eight-bit (or alternatively 32 K-word of 16-bit ) I/O space, and

SECTION 50

#1732783601928

3848-399: The frequently occurring cases or contexts where a −128..127 range is enough. Typical instructions are therefore 2 or 3 bytes in length (although some are much longer, and some are single-byte). To further conserve encoding space, most registers are expressed in opcodes using three or four bits, the latter via an opcode prefix in 64-bit mode, while at most one operand to an instruction can be

3922-437: The ground up to enable Exascale computing in the future. This new architecture is now expected for 2020–2021. One performance and programmability study reported that achieving high performance with Xeon Phi still needs help from programmers and that merely relying on compilers with traditional programming models is insufficient. Other studies in various domains, such as life sciences and deep learning, have shown that exploiting

3996-501: The implementation of position-independent code , used in shared libraries in some operating systems. SIMD registers XMM0–XMM15 (XMM0–XMM31 when AVX-512 is supported). SIMD registers YMM0–YMM15 (YMM0–YMM31 when AVX-512 is supported). Lower half of each of the YMM registers maps onto the corresponding XMM register. SIMD registers ZMM0–ZMM31. Lower half of each of the ZMM registers maps onto

4070-408: The instruction pointer (IP) points to the next instruction that will be fetched from memory and then executed; this register cannot be directly accessed (read or written) by a program. The Intel 80186 and 80188 are essentially an upgraded 8086 or 8088 CPU, respectively, with on-chip peripherals added, and they have the same CPU registers as the 8086 and 8088 (in addition to interface registers for

4144-441: The introduction of the 80386 in 1985. A few years after the introduction of the 8086 and 8088, Intel added some complexity to its naming scheme and terminology as the "iAPX" of the ambitious but ill-fated Intel iAPX 432 processor was tried on the more successful 8086 family of chips, applied as a kind of system-level prefix. An 8086 system, including coprocessors such as 8087 and 8089 , and simpler Intel-specific system chips,

4218-447: The lower 16 bits of the new 32-bit EAX register, SI corresponds to the lower 16 bits of ESI, and so on. The general-purpose registers, base registers, and index registers can all be used as the base in addressing modes, and all of those registers except for the stack pointer can be used as the index in addressing modes. Two new segment registers (FS and GS) were added. With a greater number of registers, instructions and operands,

4292-604: The market dominance of the "inelegant" x86 architecture designed directly from the first simple 8-bit microprocessors. Examples of this are the iAPX 432 (a project originally named the Intel 8800 ), the Intel 960 , Intel 860 and the Intel/Hewlett-Packard Itanium architecture. However, the continuous refinement of x86 microarchitectures , circuitry and semiconductor manufacturing would make it hard to replace x86 in many segments. AMD's 64-bit extension of x86 (which Intel eventually responded to with

4366-473: The name EM64T and finally using Intel 64. Microsoft and Sun Microsystems / Oracle also use term "x64", while many Linux distributions , and the BSDs also use the "amd64" term. Microsoft Windows, for example, designates its 32-bit versions as "x86" and 64-bit versions as "x64", while installation files of 64-bit Windows versions are required to be placed into a directory called "AMD64". In 2023, Intel proposed

4440-454: The peripherals). The 8086, 8088, 80186, and 80188 can use an optional floating-point coprocessor, the 8087 . The 8087 appears to the programmer as part of the CPU and adds eight 80-bit wide registers, st(0) to st(7), each of which can hold numeric data in one of seven formats: 32-, 64-, or 80-bit floating point, 16-, 32-, or 64-bit (binary) integer, and 80-bit packed decimal integer. It also has its own 16-bit status register accessible through

4514-418: The same order as given in the instruction stream. Some Intel CPUs ( Xeon Foster MP , some Pentium 4 , and some Nehalem and later Intel Core processors) and AMD CPUs (starting from Zen ) are also capable of simultaneous multithreading with two threads per core ( Xeon Phi has four threads per core). Some Intel CPUs support transactional memory ( TSX ). When introduced, in the mid-1990s, this method

SECTION 60

#1732783601928

4588-443: The same simplified segmentation as long mode. The x86 architecture is a variable instruction length, primarily " CISC " design with emphasis on backward compatibility . The instruction set is not typical CISC, however, but basically an extended version of the simple eight-bit 8008 and 8080 architectures. Byte-addressing is enabled and words are stored in memory with little-endian byte order. Memory access to unaligned addresses

4662-454: The stack. Much work has therefore been invested in making such accesses as fast as register accesses—i.e., a one cycle instruction throughput, in most circumstances where the accessed data is available in the top-level cache. A dedicated floating-point processor with 80-bit internal registers, the 8087 , was developed for the original 8086 . This microprocessor subsequently developed into the extended 80387 , and later processors incorporated

4736-459: The suffix F in the model number. Integrated fabric is expected to provide better latency at a lower cost than discrete high-performance network cards. On 14 November 2016, the 48th list of TOP500 contained two systems using Knights Landing in the Top 10. The PCIe based co-processor variant of Knight's Landing was never offered to the general market and was discontinued by August 2017. This included

4810-483: The thread- and SIMD-parallelism of Xeon Phi achieves significant speed-ups. X86 x86 (also known as 80x86 or the 8086 family ) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088 . The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as

4884-490: The x86 naming scheme now legally cleared, other x86 vendors had to choose different names for their x86-compatible products, and initially some chose to continue with variations of the numbering scheme: IBM partnered with Cyrix to produce the 5x86 and then the very efficient 6x86 (M1) and 6x86 MX ( MII ) lines of Cyrix designs, which were the first x86 microprocessors implementing register renaming to enable speculative execution . AMD meanwhile designed and manufactured

4958-484: The years, almost consistently with full backward compatibility . The architecture family has been implemented in processors from Intel, Cyrix , AMD , VIA Technologies and many other companies; there are also open implementations, such as the Zet SoC platform (currently inactive). Nevertheless, of those, only Intel, AMD, VIA Technologies, and DM&P Electronics hold x86 architectural licenses, and from these, only

5032-537: Was also affected by a few minor compatibility problems, the Nx586 lacked a floating-point unit (FPU) and (the then crucial) pin-compatibility, while the K5 had somewhat disappointing performance when it was (eventually) introduced. Customer ignorance of alternatives to the Pentium series further contributed to these designs being comparatively unsuccessful, despite the fact that the K5 had very good Pentium compatibility and

5106-521: Was cancelled in 2009, it shared application areas with GPUs. The main difference between Xeon Phi and a GPGPU like Nvidia Tesla was that Xeon Phi, with an x86-compatible core, could, with less modification, run software that was originally targeted to a standard x86 CPU. Initially in the form of PCI Express -based add-on cards, a second-generation product, codenamed Knights Landing , was announced in June 2013. These second-generation chips could be used as

5180-455: Was discontinued due to a lack of demand and Intel's problems with its 10nm node. The Larrabee microarchitecture (in development since 2006) introduced very wide (512-bit) SIMD units to an x86 architecture based processor design, extended to a cache-coherent multiprocessor system connected via a ring bus to memory; each core was capable of four-way multithreading. Due to the design being intended for GPU as well as general purpose computing,

5254-403: Was originally referred to as the i386 architecture (like its first implementation) but Intel later dubbed it IA-32 when introducing its (unrelated) IA-64 architecture. In 1999–2003, AMD extended this 32-bit architecture to 64 bits and referred to it as x86-64 in early documents and later as AMD64 . Intel soon adopted AMD's architectural extensions under the name IA-32e, later using

5328-436: Was sometimes referred to as a "RISC core" or as "RISC translation", partly for marketing reasons, but also because these micro-operations share some properties with certain types of RISC instructions. However, traditional microcode (used since the 1950s) also inherently shares many of the same properties; the new method differs mainly in that the translation to micro-operations now occurs asynchronously. Not having to synchronize

5402-609: Was the codename for the third-generation MIC architecture, for which Intel announced the first details at SC14. It was to be manufactured in a 10 nm process. Knights Hill was expected to be used in the United States Department of Energy Aurora supercomputer , to be deployed at Argonne National Laboratory . However, Aurora was delayed in favor of using an "advanced architecture" with a focus on machine learning. In 2017, Intel announced that Knights Hill had been canceled in favor of another architecture built from

5476-478: Was thereby described as an iAPX 86 system. There were also terms iRMX (for operating systems), iSBC (for single-board computers), and iSBX (for multimodule boards based on the 8086-architecture), all together under the heading Microsystem 80 . However, this naming scheme was quite temporary, lasting for a few years during the early 1980s. Although the 8086 was primarily developed for embedded systems and small multi-user or single-user computers, largely as

#927072