Misplaced Pages

I860

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The Intel i860 (also known as 80860 ) is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of the 1980s. It was the world's first million-transistor chip. It was released with considerable fanfare, slightly obscuring the earlier Intel i960 , which was successful in some niches of embedded systems . The i860 never achieved commercial success and the project was terminated in the mid-1990s.

#596403

59-437: i860 may refer to: Intel i860 , a VLIW RISC microprocessor Intel 860 Chipset Motorola i860 , a mobile phone See also [ edit ] 1860 [REDACTED] Topics referred to by the same term This disambiguation page lists articles associated with the same title formed as a letter–number combination. If an internal link led you here, you may wish to change

118-511: A SIMD processor somewhere in its architecture. The PlayStation 2 was unusual in that one of its vector-float units could function as an autonomous DSP executing its own instruction stream, or as a coprocessor driven by ordinary CPU instructions. 3D graphics applications tend to lend themselves well to SIMD processing as they rely heavily on operations with 4-dimensional vectors. Microsoft 's Direct3D 9.0 now chooses at runtime processor-specific implementations of its own math operations, including

177-404: A SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "retrieve this pixel, now retrieve the next pixel", a SIMD processor will have a single instruction that effectively says "retrieve n pixels" (where n is a number that varies from design to design). For

236-527: A common operation in many multimedia applications. One example would be changing the brightness of an image. Each pixel of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are written back out to memory. Audio DSPs would likewise, for volume control, multiply both Left and Right channels simultaneously. With

295-485: A few "vector registers" that use the same interfaces across all CPUs with this instruction set. The hardware handles all alignment issues and "strip-mining" of loops. Machines with different vector sizes would be able to run the same code. LLVM calls this vector type "vscale". An order of magnitude increase in code size is not uncommon, when compared to equivalent scalar or equivalent vector code, and an order of magnitude or greater effectiveness (work done per instruction)

354-491: A floating-point adder, a floating-point multiplier, or a 64-bit integer graphics unit. The system had separate pipelines for the ALU, floating-point adder, floating-point multiplier, and graphics unit. It can fetch and decode one "core" instruction and one "floating-point or graphics" instruction per clock. When using dual-operation floating-point instructions (which transfer values between subsequent dual-operation instructions), it

413-524: A group of three SHARC DSPs. Good performance was obtained from the i860 by supplying customers with a library of signal processing functions written in assembly language. The hardware packed up to 360 compute nodes in 9U of rack space, making it suitable for mobile applications such as airborne radar processing. During the early 1990s, Stratus Technologies built i860-based servers, the XA/R series, running their proprietary VOS operating system. Also in

472-417: A powerful tool in real-time video processing applications like conversion between various video standards and frame rates ( NTSC to/from PAL , NTSC to/from HDTV formats, etc.), deinterlacing , image noise reduction , adaptive video compression , and image enhancement. A more ubiquitous application for SIMD is found in video games : nearly every modern video game console since 1998 has incorporated

531-402: A safe fallback mechanism on unsupported CPUs to simple loops. Instead of providing an SIMD datatype, compilers can also be hinted to auto-vectorize some loops, potentially taking some assertions about the lack of data dependency. This is not as flexible as manipulating SIMD variables directly, but is easier to use. OpenMP 4.0+ has a #pragma omp simd hint. This OpenMP interface has replaced

590-556: A single instruction without any overhead. This is similar to C and C++ intrinsics. Benchmarks for 4×4 matrix multiplication , 3D vertex transformation , and Mandelbrot set visualization show near 400% speedup compared to scalar code written in Dart. McCutchan's work on Dart, now called SIMD.js, has been adopted by ECMAScript and Intel announced at IDF 2013 that they are implementing McCutchan's specification for both V8 and SpiderMonkey . However, by 2017, SIMD.js has been taken out of

649-494: A single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector processing architectures are now considered separate from SIMD computers: Duncan's Taxonomy includes them whereas Flynn's Taxonomy does not, due to Flynn's work (1966, 1972) pre-dating the Cray-1 (1977). The first era of modern SIMD computers was characterized by massively parallel processing -style supercomputers such as

SECTION 10

#1732791630597

708-576: A small number of supercomputers such as the Intel iPSC/860 . Intel later marketed the i860 as a workstation microprocessor for a time, where it competed with microprocessors based on the MIPS and SPARC architectures, among others. The Oki Electric OKI Station 7300/30 and Stardent Vistra 800 Unix workstations were based on a 40 MHz i860 XR running UNIX System V /i860. The Hauppauge 4860 and Olivetti CP486 featured an Intel 80486 and i860 on

767-468: A variety of reasons, this can take much less time than retrieving each pixel individually, as with a traditional CPU design. Another advantage is that the instruction operates on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time. This parallelism

826-488: A wide set of nonstandard extensions, including Cilk 's #pragma simd , GCC's #pragma GCC ivdep , and many more. Consumer software is typically expected to work on a range of CPUs covering multiple generations, which could limit the programmer's ability to use new SIMD instructions to improve the computational performance of a program. The solution is to include multiple versions of the same code that uses either older or newer SIMD technologies, and pick one that best fits

885-423: Is able to execute up to three operations (one ALU, one floating-point multiply, and one floating-point add-or-subtract) per clock. All of the data buses were at least 64 bits wide. The internal memory bus to the cache, for instance, was 128 bits wide. The "core" class instructions use thirty-two 32-bit integer registers. But the "floating-point or graphics" instructions use a register file that can be accessed by

944-567: Is achievable with Vector ISAs. ARM's Scalable Vector Extension takes another approach, known in Flynn's Taxonomy as "Associative Processing", more commonly known today as "Predicated" (masked) SIMD. This approach is not as compact as Vector processing but is still far better than non-predicated SIMD. Detailed comparative examples are given in the Vector processing page. Small-scale (64 or 128 bits) SIMD became popular on general-purpose CPUs in

1003-415: Is common for publishers of the SIMD instruction sets to make their own C/C++ language extensions with intrinsic functions or special datatypes (with operator overloading ) guaranteeing the generation of vector code. Intel, AltiVec, and ARM NEON provide extensions widely adopted by the compilers targeting their CPUs. (More complex operations are the task of vector math libraries.) The GNU C Compiler takes

1062-699: Is easier to achieve as only compiler switches need to be changed. Glibc supports LMV and this functionality is adopted by the Intel-backed Clear Linux project. In 2013 John McCutchan announced that he had created a high-performance interface to SIMD instruction sets for the Dart programming language, bringing the benefits of SIMD to web programs for the first time. The interface consists of two types: Instances of these types are immutable and in optimized code are mapped directly to SIMD registers. Operations expressed in Dart typically are compiled into

1121-736: Is heavily SIMD based. Philips , now NXP , developed several SIMD processors named Xetal . The Xetal has 320 16-bit processor elements especially designed for vision tasks. Intel's AVX-512 SIMD instructions process 512 bits of data at once. SIMD instructions are widely used to process 3D graphics, although modern graphics cards with embedded SIMD have largely taken over this task from the CPU. Some systems also include permute functions that re-pack elements inside vectors, making them particularly useful for data processing and compression. They are also used in cryptography. The trend of general-purpose computing on GPUs ( GPGPU ) may lead to wider use of SIMD in

1180-440: Is separate from the parallelism provided by a superscalar processor ; the eight values are processed in parallel even on a non-superscalar processor, and a superscalar processor may be able to perform multiple SIMD operations in parallel. To remedy problems 1 and 5, RISC-V 's vector extension uses an alternative approach: instead of exposing the sub-register-level details to the programmer, the instruction set abstracts them out as

1239-590: Is true simultaneous parallel hardware-level execution. Modern graphics processing units (GPUs) are often wide SIMD implementations. The first use of SIMD instructions was in the ILLIAC IV , which was completed in 1966. SIMD was the basis for vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC , which could operate on a "vector" of data with

SECTION 20

#1732791630597

1298-700: The FPU and MMX registers . Compilers also often lacked support, requiring programmers to resort to assembly language coding. SIMD on x86 had a slow start. The introduction of 3DNow! by AMD and SSE by Intel confused matters somewhat, but today the system seems to have settled down (after AMD adopted SSE) and newer compilers should result in more SIMD-enabled software. Intel and AMD now both provide optimized math libraries that use SIMD instructions, and open source alternatives like libSIMD , SIMDx86 and SLEEF have started to appear (see also libm ). Apple Computer had somewhat more success, even though they entered

1357-525: The Thinking Machines CM-1 and CM-2 . These computers had many limited-functionality processors that would work in parallel. For example, each of 65,536 single-bit processors in a Thinking Machines CM-2 would execute the same instruction at the same time, allowing, for instance, to logically combine 65,536 pairs of bits at a time, using a hypercube-connected network or processor-dedicated RAM to find its operands. Supercomputing moved away from

1416-632: The 1990s, Alliant Computer Systems built their i860-based FX/800 and FX/2800 servers, replacing the FX/80 and FX/8 series that had been based on the Motorola 68000 ISA. Both the Alliant and Mercury compute systems were in heavy use at NASA/JPL for the SIR-C missions. The U.S. military used the i860 for numerous aerospace and digital signal processing applications as a coprocessor, where it saw use up until

1475-455: The ALU and FPU parts) and an interrupt could spill them and require them all to be re-loaded. This took 62 cycles in the best case, and almost 2000 cycles in the worst. The latter is 1/20000th of a second at 40 MHz (50 microseconds), an eternity for a CPU. This largely eliminated the i860 as a general purpose CPU. As the compilers improved, the general performance of the i860 did likewise, but by then most other RISC designs had already passed

1534-583: The ECMAScript standard queue in favor of pursuing a similar interface in WebAssembly . As of August 2020, the WebAssembly interface remains unfinished, but its portable 128-bit SIMD feature has already seen some use in many engines. Emscripten, Mozilla's C/C++-to-JavaScript compiler, with extensions can enable compilation of C++ programs that make use of SIMD intrinsics or GCC-style vector code to

1593-562: The GCC extension. LLVM's libcxx seems to implement it. For GCC and libstdc++, a wrapper library that builds on top of the GCC extension is available. Microsoft added SIMD to .NET in RyuJIT. The System.Numerics.Vector package, available on NuGet, implements SIMD datatypes. Java also has a new proposed API for SIMD instructions available in OpenJDK 17 in an incubator module. It also has

1652-579: The SIMD API of JavaScript, resulting in equivalent speedups compared to scalar code. It also supports (and now prefers) the WebAssembly 128-bit SIMD proposal. It has generally proven difficult to find sustainable commercial applications for SIMD-only processors. One that has had some measure of success is the GAPP , which was developed by Lockheed Martin and taken to the commercial sector by their spin-off Teranex . The GAPP's recent incarnations have become

1711-508: The SIMD approach when inexpensive scalar MIMD approaches based on commodity processors such as the Intel i860 XP became more powerful, and interest in SIMD waned. The current era of SIMD processors grew out of the desktop-computer market rather than the supercomputer market. As desktop processors became powerful enough to support real-time gaming and audio/video processing during the 1990s, demand grew for this particular type of computing power, and microprocessor vendors turned to SIMD to meet

1770-540: The SIMD market later than the rest. AltiVec offered a rich system and can be programmed using increasingly sophisticated compilers from Motorola , IBM and GNU , therefore assembly language programming is rarely needed. Additionally, many of the systems that would benefit from SIMD were supplied by Apple itself, for example iTunes and QuickTime . However, in 2006, Apple computers moved to Intel x86 processors. Apple's APIs and development tools ( XCode ) were modified to support SSE2 and SSE3 as well as AltiVec. Apple

1829-483: The XP versions, manually written assembler code managed to get only about up to 40 MFLOPS, and most compilers had difficulty getting even 10 MFLOPs. The later Itanium architecture, also a VLIW design, suffered again from the problem of compilers incapable of delivering sufficiently optimized code. Another serious problem was the lack of any solution to handle context switching quickly. The i860 had several pipelines (for

I860 - Misplaced Pages Continue

1888-403: The cache, yet there is no way for the programmer to know if they are or not. If an incorrect guess is made, the entire pipeline will stall, waiting for the data. The entire i860 design was based on the compiler efficiently handling this task, which proved almost impossible in practice. While theoretically capable of peaking at about 60-80 MFLOPS for both single precision and double precision for

1947-472: The clock to 40 and 50 MHz. Both microprocessors supported the same instruction set for application programs. The i860 combined a number of features that were unique at the time, most notably its very long instruction word (VLIW) architecture and powerful support for high-speed floating-point operations. The design uses two classes of instructions: "core" instructions which use a 32-bit ALU , and "floating-point or graphics" instructions which operate on

2006-418: The code to "clone" functions, while ICC does so automatically (under the command-line option /Qax ). The Rust programming language also supports FMV. The setup is similar to GCC and Clang in that the code defines what instruction sets to compile for, but cloning is manually done via inlining. As using FMV requires code modification on GCC and Clang, vendors more commonly use library multi-versioning: this

2065-486: The compilers to get the ordering right. Truevision produced an i860-based accelerator board intended for use with their Targa and Vista framebuffer cards. Pixar produced a custom version of RenderMan to run on the card that ran approximately four times faster than the 386 host. Another example was SGI 's RealityEngine , which used a number of i860XP processors in its geometry engine. This sort of use slowly disappeared as well, as more general-purpose CPUs started to match

2124-443: The demand. Hewlett-Packard introduced MAX instructions into PA-RISC 1.1 desktops in 1994 to accelerate MPEG decoding. Sun Microsystems introduced SIMD integer instructions in its " VIS " instruction set extensions in 1995, in its UltraSPARC I microprocessor. MIPS followed suit with their similar MDMX system. The first widely deployed desktop SIMD was with Intel's MMX extensions to the x86 architecture in 1996. This sparked

2183-594: The early 1990s and continued through 1997 and later with Motion Video Instructions (MVI) for Alpha . SIMD instructions can be found, to one degree or another, on most CPUs, including IBM 's AltiVec and SPE for PowerPC , HP 's PA-RISC Multimedia Acceleration eXtensions (MAX), Intel 's MMX and iwMMXt , SSE , SSE2 , SSE3 SSSE3 and SSE4.x , AMD 's 3DNow! , ARC 's ARC Video subsystem, SPARC 's VIS and VIS2, Sun 's MAJC , ARM 's Neon technology, MIPS ' MDMX (MaDMaX) and MIPS-3D . The IBM, Sony, Toshiba co-developed Cell Processor 's SPU 's instruction set

2242-543: The exact same instruction at any given moment (just with different data). SIMD is particularly applicable to common tasks such as adjusting the contrast in a digital image or adjusting the volume of digital audio . Most modern CPU designs include SIMD instructions to improve the performance of multimedia use. SIMD has three different subcategories in Flynn's 1972 Taxonomy , one of which is SIMT . SIMT should not be confused with software threads or hardware threads , both of which are task time-sharing (time-slicing). SIMT

2301-509: The extensions a step further by abstracting them into a universal interface that can be used on any platform by providing a way of defining SIMD datatypes. The LLVM Clang compiler also implements the feature, with an analogous interface defined in the IR. Rust's packed_simd crate (and the experimental std::sims ) uses this interface, and so does Swift 2.0+. C++ has an experimental interface std::experimental::simd that works similarly to

2360-437: The floating point units as either thirty-two 32-bit, sixteen 64-bit, or eight 128-bit floating-point registers, or that can be accessed by the graphics unit as sixteen 64-bit integer registers. The "core" unit is responsible for fetching instructions, and in the normal "single-instruction" mode can fetch one 32-bit "core" or one 32-bit "floating point or graphics" instruction per cycle. But when executing in dual-instruction mode,

2419-577: The following instruction will be executed prior to transferring control to the branch target instruction. It means the i860 has a single branch delay slot. On paper, performance was impressive for a single-chip solution; however, real-world performance was anything but. One problem, perhaps unrecognized at the time, was that runtime code paths are difficult to predict, meaning that it becomes exceedingly difficult to order instructions properly at compile time . For instance, an instruction to add two numbers will take considerably longer if those numbers are not in

I860 - Misplaced Pages Continue

2478-465: The future. Adoption of SIMD systems in personal computer software was at first slow, due to a number of problems. One was that many of the early SIMD instruction sets tended to slow overall performance of the system due to the re-use of existing floating point registers. Other systems, like MMX and 3DNow! , offered support for data types that were not interesting to a wide audience and had expensive context switching instructions to switch between using

2537-474: The ground up with no separate scalar registers. Ziilabs produced an SIMD type processor for use on mobile devices, such as media players and mobile phones. Larger scale commercial SIMD processors are available from ClearSpeed Technology, Ltd. and Stream Processors, Inc. ClearSpeed 's CSX600 (2004) has 96 cores each with two double-precision floating point units while the CSX700 (2008) has 192. Stream Processors

2596-420: The i860 architecture is the i860 XR microprocessor (code-named N10 ), which ran at 25, 33, or 40 MHz. The second-generation i860 XP microprocessor (code named N11 ) added 4 Mbyte pages, larger on-chip caches, second level cache support, faster buses, and hardware support for bus snooping, for cache consistency in multiprocessor systems. A process shrink for the XP (from 1 μm to 0.8 CHMOS V) increased

2655-581: The i860 in performance. In the late 1990s, Intel replaced their entire RISC line with ARM -based designs, known as the XScale . Confusingly, the 860 number has since been re-used for a motherboard control chipset for Intel Xeon (high-end Pentium ) systems and a model of the Core i7. Andy Grove suggested that the i860's failure in the marketplace was due to Intel being stretched too thin: We now had two very powerful chips that we were introducing at just about

2714-514: The i860 influenced the MMX functionality later added to Intel's Pentium processors. The pipelines into the functional units are program-accessible ( VLIW ), requiring the compilers to order instructions carefully in the object code to keep the pipelines filled. In traditional architectures these duties were handled at runtime by a scheduler on the CPU itself, but the complexity of these systems limited their application in early RISC designs. The i860

2773-410: The i860's performance, and as Intel turned its focus to Pentium processors for general-purpose computing. Mercury Computer Systems used the i860 in their multicomputer . From 2 to 360 compute nodes would reside in a circuit switched fat tree network, with each node having local memory that could be mapped by any other node. Each node in this heterogeneous system could be an i860, a PowerPC , or

2832-601: The instruction cache is accessed as VLIW instructions consisting of a 32-bit "core" instruction paired with a 32-bit "floating-point or graphics" instruction, simultaneously fetched together over a 64-bit bus. Intel referred to the design as the "i860 64-Bit Microprocessor". Intel i860 instructions acted on data sizes from 8-bit through 128-bit. The graphics supports SIMD -like instructions in addition to basic 64-bit integer math. For instance, its 64-bit integer datapath can represent multiple pixels together as either 8-bit pixels, 16-bit pixels, or 32-bit pixels. Experience with

2891-829: The introduction of the much more powerful AltiVec system in the Motorola PowerPC and IBM's POWER systems. Intel responded in 1999 by introducing the all-new SSE system. Since then, there have been several extensions to the SIMD instruction sets for both architectures. Advanced vector extensions AVX, AVX2 and AVX-512 are developed by Intel. AMD supports AVX, AVX2 , and AVX-512 in their current products. All of these developments have been oriented toward support for real-time graphics, and are therefore oriented toward processing in two, three, or four dimensions, usually with vector lengths of between two and sixteen words, depending on data type and architecture. When new SIMD architectures need to be distinguished from older ones,

2950-589: The late 1990s. SIMD Single instruction, multiple data ( SIMD ) is a type of parallel processing in Flynn's taxonomy . SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism , but not concurrency : there are simultaneous (parallel) computations, but each unit performs

3009-410: The link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=I860&oldid=1192507706 " Category : Letter–number combination disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages Intel i860 The first implementation of

SECTION 50

#1732791630597

3068-428: The newer architectures are then considered "short-vector" architectures, as earlier SIMD and vector supercomputers had vector lengths from 64 to 64,000. A modern supercomputer is almost always a cluster of MIMD computers, each of which implements (short-vector) SIMD instructions. An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of data points,

3127-466: The same motherboard. Microsoft initially developed what was to become Windows NT on internally designed i860XR-based workstations (codenamed Dazzle ), only porting NT to the MIPS ( Microsoft Jazz ), Intel 80386 and other processors later. Some claim the NT designation was a reference to the "N-Ten" codename of the i860XR. The i860 did see some use in the workstation world as a graphics accelerator. It

3186-461: The same time: the 486, largely based on CISC technology and compatible with all the PC software, and the i860, based on RISC technology, which was very fast but compatible with nothing. We didn't know what to do. So we introduced both, figuring we'd let the marketplace decide. ... our equivocation caused our customers to wonder what Intel really stood for, the 486 or i860? At first, the i860 was only used in

3245-609: The use of SIMD-capable instructions. A later processor that used vector processing is the Cell Processor used in the Playstation 3, which was developed by IBM in cooperation with Toshiba and Sony . It uses a number of SIMD processors (a NUMA architecture, each with independent local store and controlled by a general purpose CPU) and is geared towards the huge datasets required by 3D and video processing applications. It differs from traditional ISAs by being SIMD from

3304-482: The user's CPU at run-time ( dynamic dispatch ). There are two main camps of solutions: FMV, manually coded in assembly language, is quite commonly used in a number of performance-critical libraries such as glibc and libjpeg-turbo. Intel C++ Compiler , GNU Compiler Collection since GCC 6, and Clang since clang 7 allow for a simplified approach, with the compiler taking care of function duplication and selection. GCC and clang requires explicit target_clones labels in

3363-517: Was an attempt to avoid this entirely by moving this duty off-chip into the compiler. This allowed the i860 to devote more room to functional units, improving performance. As a result of its architecture, the i860 could run certain graphics and floating-point algorithms with exceptionally high speed, but its performance in general-purpose applications suffered and it was difficult to program efficiently (see below). The i860 has both non-delayed and delayed branch instructions. When delayed branches are taken,

3422-594: Was the dominant purchaser of PowerPC chips from IBM and Freescale Semiconductor . Even though Apple has stopped using PowerPC processors in their products, further development of AltiVec is continued in several PowerPC and Power ISA designs from Freescale and IBM. SIMD within a register , or SWAR , is a range of techniques and tricks used for performing SIMD in general-purpose registers on hardware that does not provide any direct support for SIMD instructions. This can be used to exploit parallelism in certain algorithms even on hardware that does not support SIMD directly. It

3481-573: Was used, for instance, in the NeXTdimension , where it ran a cut-down version of the Mach kernel running a complete PostScript stack. However, the PostScript part of the project was never finished so it ended up just moving color pixels around. In this role, the i860 design worked considerably better, as the core program could be loaded into the cache and made entirely "predictable", allowing

#596403