AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple , IBM , and Freescale Semiconductor (formerly Motorola 's Semiconductor Products Sector) — the AIM alliance . It is implemented on versions of the PowerPC processor architecture, including Motorola's G4 , IBM 's G5 and POWER6 processors, and P.A. Semi 's PWRficient PA6T. AltiVec is a trademark owned solely by Freescale, so the system is also referred to as Velocity Engine by Apple and VMX ( Vector Multimedia Extension ) by IBM and P.A. Semi.
60-620: While AltiVec refers to an instruction set, the implementations in CPUs produced by IBM and Motorola are separate in terms of logic design. To date, no IBM core has included an AltiVec logic design licensed from Motorola or vice versa. AltiVec is a standard part of the Power ISA v.2.03 specification. It was never formally a part of the PowerPC architecture until this specification although it used PowerPC instruction formats and syntax and occupied
120-706: A random number generator , hardware-assisted garbage collection and hardware-enforced trusted computing. The spec was revised in March 2017 to the Power ISA v.3.0 B spec, and revised again to v3.0C in May 2020. One major change from v3.0 to v3.0B is the removal of support for hardware assisted garbage collection. The key difference between v3.0B and v3.0C is that the Compliancy Levels listed in v3.1 were also added to v3.0C. The specification for Power ISA v.3.1
180-581: A 64-bit version of the ISA and support for SMP . In 1990, IBM wanted to merge the low end server and mid range server architectures, the RS/6000 RISC ISA and AS/400 CISC ISA into one common RISC ISA that could host both IBM's AIX and OS/400 operating systems. The existing POWER and the upcoming PowerPC ISAs were deemed unsuitable by the AS/400 team so an extension to the 64-bit PowerPC instruction set
240-594: A clock chip. The POWER1 is the first microprocessor that used register renaming and out-of-order execution . A simplified and less powerful version of the 10 chip RIOS-1 was made in 1992, for lower-end RS/6000s. It uses only one chip and is called RISC Single Chip or RSC. IBM started the POWER2 processor effort as a successor to the POWER1. By adding a second fixed-point unit, a second powerful floating point unit, and other performance enhancements and new instructions to
300-577: A collection of licensees of the specification like AMCC , Synopsys , Sony , Microsoft , P.A. Semi , CRAY , and Xilinx that needed coordination. The joint effort was not only to streamline development of the technology but also to streamline marketing. The new instruction set architecture was called Power ISA and merged the PowerPC v.2.02 from the POWER5 with the PowerPC Book E specification from Freescale as well as some related technologies like
360-516: A complete 32/64 bit RISC architecture, and to range from very low end embedded microcontrollers to the very high end supercomputer and server applications. After two years of development, the resulting PowerPC ISA was introduced in 1993. A modified version of the RSC architecture, PowerPC added single-precision floating point instructions and general register-to-register multiply and divide instructions, and removed some POWER features. It also added
420-401: A complex operation is specified explicitly by one machine instruction, and all instructions are required to complete in the same constant time, would later come to be known as RISC . When the telephone switch project was canceled, IBM retained the design for the general purpose processor and named it 801 after building #801 at Thomas J. Watson Research Center . By 1982 IBM continued to explore
480-564: A cost- and feature-reduced version of the POWER4 called PowerPC 970 by Apple's request. The POWER5 processors built on the popular POWER4 and incorporated simultaneous multithreading into the design, a technology pioneered in the PowerPC AS based RS64-III processor, and on-die memory controllers . It was designed for multiprocessing on a massive scale and came in multi-chip modules with onboard large L3 cache chips. A joint organization
540-541: A flexible vector permute instruction , in which each byte of a resulting vector value can be taken from any byte of either of two other vectors, parametrized by yet another vector. This allows for sophisticated manipulations in a single instruction. Recent versions of the GNU Compiler Collection (GCC), IBM VisualAge compiler and other compilers provide intrinsics to access VMX/AltiVec instructions directly from C and C++ programs. As of version 4,
600-479: A mode in which it could disable cores to reach higher frequencies for the ones that are left. It uses a new high-performance floating point unit called VSX that merges the functionality of the traditional FPU with AltiVec. Even while the POWER7 run at lower frequencies than POWER6, each POWER7 core performs faster than its POWER6 counterpart. POWER8 is a 4 GHz, 12 core processor with 8 hardware threads per core for
660-887: A second-generation RISC architecture started at the IBM Thomas J. Watson Research Center, producing the "AMERICA architecture". In 1986, IBM Austin started developing the RS/6000 series computers based on that architecture. This was to become the first POWER processors using the first POWER ISA. The first IBM computers to incorporate the POWER ISA are the RISC System/6000 or RS/6000 series. They were released in February 1990. These RS/6000 computers were divided into two classes, POWERstation workstations and POWERserver servers. The first RS/6000 CPU has 2 configurations, called
SECTION 10
#1732781161740720-445: A special RGB " pixel " data type, but it does not operate on 64-bit double-precision floats, and there is no way to move data directly between scalar and vector registers. In keeping with the "load/store" model of the PowerPC's RISC design, the vector registers, like the scalar registers, can only be loaded from and stored to memory. However, VMX/AltiVec provides a much more complete set of "horizontal" operations that work across all
780-450: A total of 128 registers. VMX128 is not entirely compatible with VMX/Altivec, as a number of integer operations were removed to make space for the larger register file and additional application-specific operations. Power ISA v2.06 introduced VSX vector-scalar instructions which extend SIMD processing for the Power ISA to support up to 64 registers, with support for regular floating point, decimal floating point and vector execution. POWER7
840-412: A total of 96 threads of parallel execution. It uses 96 MB of eDRAM L3 cache on chip and 128 MB off-chip L4 cache and a new extension bus called CAPI that runs on top of PCIe, replacing the older GX bus . The CAPI bus can be used to attach dedicated off-chip accelerator chips such as GPUs , ASICs and FPGAs . IBM states that it is two to three times as fast as its predecessor, the POWER7. It
900-629: A vector with four 32-bit words on a big-endian machine . The permutes move the carry and borrow bits from columns 1 and 3 to columns 0 and 2 like in school-book math. A little-endian machine would need a different mask. Power ISA 2.07 used in Power8 finally provided the 64-bit double words. A developer working with Power8 needs only to perform the following. The following processors have AltiVec, VMX or VMX128 included The following software applications are known to leverage AltiVec or VMX hardware acceleration. Power ISA#Power ISA v.2.03 Power ISA
960-550: Is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation , led by IBM . It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA is an evolution of the PowerPC ISA, created by the mergers of the core PowerPC ISA and the optional Book E for embedded applications. The merger of these two components in 2006
1020-487: Is also support for both big and little-endian addressing with separate categories for moded and per-page endianness, and support for both 32-bit and 64-bit addressing. Different modes of operation include user, supervisor and hypervisor. The Power ISA specification is divided into five parts, called "books": New in version 3 of the Power ISA is that you don't have to implement the entire specification to be compliant. The sprawl of instructions and technologies has made
1080-852: Is based on the former PowerPC ISA v.2.02 in POWER5 + and the Book E extension of the PowerPC specification. The Book I included five new chapters regarding auxiliary processing units like DSPs and the AltiVec extension. The specification for Power ISA v.2.04 was finalized in June 2007. It is based on Power ISA v.2.03 and includes changes primarily to the Book III-S part regarding virtualization , hypervisor functions, logical partitioning and virtual page handling. The specification for Power ISA v.2.05
1140-465: Is defined on vector types so that the normal C expression language can be used to manipulate vector variables. There are also overloaded intrinsic functions such as " vec_add " that emit the appropriate opcode based on the type of the elements within the vector, and very strong type checking is enforced. In contrast, the Intel-defined data types for IA-32 SIMD registers declare only the size of
1200-579: Is the first Power ISA processor to implement Power ISA v2.06. New instructions are introduced by IBM under the Vector Media Extension category for integer operations as part of the VSX extension in Power ISA 2.07. New integer vector instructions were introduced by IBM following the VMX encodings as part of the VSX extension in Power ISA v3.0. Shall be introduced with POWER9 processors. In C++,
1260-720: The vector keyword, and then use the GCC-specific __vector keyword in its place. AltiVec prior to Power ISA 2.06 with VSX lacks loading from memory using a type's natural alignment. For example, the code below requires special handling for Power6 and below when the effective address is not 16-byte aligned. The special handling adds 3 additional instructions to a load operation when VSX is not available. AltiVec prior to Power ISA 2.06 with VMX lacks 64-bit integer support. Developers who wish to operate on 64-bit data will develop routines from 32-bit components. For example, below are examples of 64-bit add and subtract in C using
SECTION 20
#17327811617401320-519: The Linux Foundation . In 1974 IBM started a project to build a telephone switching computer that required, for the time, immense computational power. Since the application was comparably simple, this machine would need only to perform I/O , branches , add register-register , move data between registers and memory , and would have no need for special instructions to perform heavy arithmetic. This simple design philosophy, whereby each step of
1380-854: The OpenPOWER ISA Workgroup . Note that it is not strictly necessary to join the OpenPOWER Foundation to submit RFCs. The EABI specifications predate the announcement and creation of the Compliancy subsets. Regarding the Linux Compliancy subset having VSX (SIMD) optional: in 2003–4, 64-bit EABI v1.9 made SIMD optional, but in July 2015, to improve performance for IBM POWER9 systems, SIMD was made mandatory in EABI v2.0. This discrepancy between SIMD being optional in
1440-590: The PlayStation 3 , also supports Power Vector Media Extension (VMX) in its PPU, with the SPU ISA being enhanced but architecturally similar. Freescale is bringing an enhanced version of AltiVec to e6500 based QorIQ processors. IBM enhanced VMX for use in Xenon (Xbox 360) and called this enhancement VMX128. The enhancements comprise new routines targeted at gaming (accelerating 3D graphics and game physics) and
1500-451: The PowerPC 970 (dubbed the "G5" by Apple) also implemented AltiVec with hardware similar to that of the PowerPC 7400 . AltiVec is a brandname trademarked by Freescale (previously Motorola) for the standard Category:Vector part of the Power ISA v.2.03 specification. This category is also known as VMX (used by IBM), and "Velocity Engine" (a brand name previously used by Apple). The Cell Broadband Engine, used in (amongst other things)
1560-518: The Summit in 2018. POWER9, which was launched in 2017, is manufactured using a 14 nm FinFET process, and comes in four versions, two 24 core SMT4 versions intended to use PowerNV for scale up and scale out applications, and two 12 core SMT8 versions intended to use PowerVM for scale-up and scale-out applications. Possibly there will be more versions in the future since the POWER9 architecture
1620-473: The opcode space expressly allocated for such purposes. Both VMX/AltiVec and SSE feature 128-bit vector registers that can represent sixteen 8-bit signed or unsigned chars, eight 16-bit signed or unsigned shorts, four 32-bit ints or four 32-bit floating-point variables. Both provide cache -control instructions intended to minimize cache pollution when working on streams of data. They also exhibit important differences. Unlike SSE2 , VMX/AltiVec supports
1680-561: The superscalar limits of the 801 design by using multiple execution units to improve performance to determine if a RISC machine could maintain multiple instructions per cycle. Many changes were made to the 801 design to allow for multiple execution units and the Cheetah processor has separate units for branch prediction , fixed-point , and floating-point execution. By 1984 CMOS was chosen because it allows improved circuit integration and transistor-logic performance. In 1985, research on
1740-516: The "RIOS-1" and "RIOS.9" (or more commonly the POWER1 CPU). A RIOS-1 configuration has a total of 10 discrete chips: an instruction cache chip, fixed-point chip, floating-point chip, 4 data L1 cache chips, storage control chip, input/output chips, and a clock chip. The lower cost RIOS.9 configuration has 8 discrete chips: an instruction cache chip, fixed-point chip, floating-point chip, 2 data cache chips, storage control chip, input/output chip, and
1800-557: The 32/64 bit PowerPC instruction set and the 64-bit PowerPC AS instruction set from the Amazon project to the new PowerPC v.2.0 specification, unifying IBM's RS/6000 and AS/400 families of computers. Besides the unification of the different platforms, POWER4 was also designed to reach very high frequencies and have large on-die L2 caches. It is the first commercially available multi-core processor and came in single-die versions as well as in four-chip multi-chip modules. In 2002, IBM also made
1860-524: The 32/64-bit PowerPC ISA set with support for SMP and single-chip implementation. It was used to great extent in IBM's RS/6000 computers, and the second generation version, the POWER3-II, is the first commercially available processor from IBM using copper interconnects . The POWER3 is the last processor to use a POWER instruction set, and all subsequent models use the PowerPC instruction sets. The POWER4 merged
AltiVec - Misplaced Pages Continue
1920-1107: The Base category. Power ISA is a RISC load/store architecture . It has multiple sets of registers : Instructions up to version 3.0 have a length of 32 bits, with the exception of the VLE (variable-length encoding) subset that provides for higher code density for low-end embedded applications, and version 3.1 which introduced prefixing to create 64-bit instructions. Most instructions are triadic , i.e. have two source operands and one destination. Single- and double-precision IEEE-754 compliant floating-point operations are supported, including additional fused multiply–add (FMA) and decimal floating-point instructions. There are provisions for single instruction, multiple data (SIMD) operations on integer and floating-point data on up to 16 elements in one instruction. Power ISA has support for Harvard cache , i.e. split data and instruction caches , and support for unified caches. Memory operations are strictly load/store, but allow for out-of-order execution . There
1980-509: The GCC also includes auto-vectorization capabilities that attempt to intelligently create VMX/Altivec accelerated binaries without the need for the programmer to use intrinsics directly. The "vector" type keyword is introduced to permit the declaration of native vector types, e.g., " vector unsigned char foo; " declares a 128-bit vector variable named "foo" containing sixteen 8-bit unsigned chars. The full complement of arithmetic and binary operators
2040-438: The Linux Compliancy level but mandatory in EABI v2.0 cannot be rectified without considerable effort: backwards incompatibility for Linux distributions is not a viable option. At present this leaves new OpenPOWER implementors wishing to run standard Linux distributions having to implement a massive 962 instructions. By contrast, RISC-V RV64GC, the minimum to run Linux, requires only 165. The specification for Power ISA v.2.03
2100-502: The POWER7 processor and e500-mc core . One significant new feature is vector-scalar floating-point instructions ( VSX ). Book III-E also includes significant enhancement for the embedded specification regarding hypervisor and virtualisation on single and multi core implementations. The spec was revised in November 2010 to the Power ISA v.2.06 revision B spec, enhancing virtualization features. The specification for Power ISA v.2.07
2160-538: The Vector-Media Extensions known under the brand name AltiVec (also called VMX by IBM) and hardware virtualization . This new ISA was called 'Power ISA v.2.03 and POWER6 was the first high end processor from IBM to use it. Older POWER and PowerPC specifications did not make the cut and those instruction sets were henceforth deprecated for good. There is no active development on any processor type today that uses these older instruction sets. POWER6
2220-468: The complete specification unwieldy, so the OpenPOWER Foundation have decided to enabled tiered compliancy. These levels include optional and mandatory requirements, however one common misunderstanding is that there is nothing stopping an implementation from being compliant at a lower level but having additional selected functions from higher levels and custom extensions. It is however recommended that an option be provided to disable any added functions beyond
2280-543: The design's declared subset level. A design must be compliant at its declared subset level to make use of the Foundation's protection regarding use of intellectual property , be it patents or trademarks . This is explained in the OpenPOWER EULA. A compliant design must: If the extension is general-purpose enough, the OpenPOWER Foundation asks that implementors submit it as a Request for Comments (RFC) to
2340-580: The design, the POWER2 ISA had leadership performance when it was announced in November 1993. The POWER2 was a multi-chip design, but IBM also made a single chip design of it, called the POWER2 Super Chip or P2SC that went into high performance servers and supercomputers. At the time of its introduction in 1996, the P2SC was the largest processor with the highest transistor count in the industry and
2400-409: The elements of a vector; the allowable combinations of data type and operations are much more complete. Thirty-two 128-bit vector registers are provided, compared to eight for SSE and SSE2 (extended to 16 in x86-64 ), and most VMX/AltiVec instructions take three register operands compared to only two register/register or register/memory operands on IA-32 . VMX/AltiVec is also unique in its support for
2460-565: The former System p and System i server and workstation families into one family called Power Systems . Power Systems machines can run different operating systems like AIX, Linux , and IBM i . The POWER7 symmetric multiprocessor design was a substantial evolution from the POWER6 design, focusing more on power efficiency through multiple cores, simultaneous multithreading (SMT), out-of-order execution and large on-die eDRAM L3 caches. The eight-core chip could execute 32 threads in parallel, and has
AltiVec - Misplaced Pages Continue
2520-464: The latest being the IBM Telum . Because of eCLipz, the POWER6 is an unusual design as it aimed for very high frequencies and sacrificed out-of-order execution, something that has been a feature for POWER and PowerPC processors since their inception. POWER6 also introduced the decimal floating point unit to the Power ISA, something it shares with z/Architecture. With the POWER6, in 2008 IBM merged
2580-478: The new 64-bit prefixed instructions is the extension of immediates in branches to 34-bit. The spec was revised in September 2021 to the Power ISA v.3.1B spec. The spec was revised in May 2024 to the Power ISA v.3.1C spec. IBM Power microprocessors IBM Power microprocessors (originally POWER prior to Power10) are designed and sold by IBM for servers and supercomputers . The name "POWER"
2640-506: The standard way of accessing AltiVec support is mutually exclusive with the use of the Standard Template Library vector<> class template due to the treatment of "vector" as a reserved word when the compiler does not implement the context-sensitive keyword version of vector. However, it may be possible to combine them using compiler-specific workarounds; for instance, in GCC one may do #undef vector to remove
2700-443: The vector register (128 or 64 bits) and in the case of a 128-bit register, whether it contains integers or floating-point values. The programmer must select the appropriate intrinsic for the data types in use, e.g., " _mm_add_epi16(x,y) " for adding two vectors containing eight 16-bit integers. The Power Vector Media Extension (VMX) was developed between 1996 and 1998 by a collaborative project between Apple, IBM, and Motorola. Apple
2760-506: Was a leader in floating point operations. In 1991, Apple looked for a future alternative to the CISC -based Motorola 68000 series platform, and Motorola experimented with a RISC platform of its own, the 88000 . IBM joined the discussion and the three founded the AIM alliance to build the PowerPC ISA, heavily based on the POWER ISA, but with additions from both Apple and Motorola. It was to be
2820-416: Was developed called PowerPC AS for Advances Series or Amazon Series . Later, additions from the RS/6000 team and AIM Alliance PowerPC were included, and by 2001, with the introduction of POWER4, they were all joined into one instruction set architecture: the PowerPC v.2.0. The POWER3 began as PowerPC 630, a successor of the commercially unsuccessful PowerPC 620 . It uses a combination of the POWER2 ISA and
2880-577: Was first built on a 22 nanometer process in 2014. In December 2012, IBM began submitting patches to the 3.8 version of the Linux kernel , to support new POWER8 features including the VSX-2 instructions. IBM spent much time designing the POWER9 processor according to William Starke, a systems architect for the POWER8 processor. The POWER9 is the first to incorporate elements of the Power ISA version 3.0 that
2940-400: Was founded in 2004 called Power.org with the mission to unify and coordinate future development of the PowerPC specifications. By then, the PowerPC specification was fragmented since Freescale (née Motorola) and IBM had taken different paths in their respective development of it. Freescale had prioritized 32-bit embedded applications and IBM high-end servers and supercomputers. There was also
3000-423: Was led by Power.org founders IBM and Freescale Semiconductor . Prior to version 3.0, the ISA is divided into several categories. Processors implement a set of these categories as required for their task. Different classes of processors are required to implement certain categories, for example a server-class processor includes the categories: Base , Server , Floating-Point , 64-Bit , etc. All processors implement
3060-457: Was originally developed in the late 1980s, and remains under active development. In the beginning, they implemented the POWER instruction set architecture (ISA), which evolved into PowerPC and later into Power ISA . In August 2019, IBM announced it would open source the Power ISA. As part of the move, it was also announced that administration of the OpenPOWER Foundation will now be handled by
SECTION 50
#17327811617403120-423: Was originally presented as an acronym for "Performance Optimization With Enhanced RISC". The Power line of microprocessors has been used in IBM's RS/6000 , AS/400 , pSeries , iSeries, System p, System i, and Power Systems lines of servers and supercomputers. They have also been used in data storage devices and workstations by IBM and by other server manufacturers like Bull and Hitachi . The Power family
3180-534: Was released in December 2007. It is based on Power ISA v.2.04 and includes changes primarily to Book I and Book III-S , including significant enhancements such as decimal arithmetic (Category: Decimal Floating-Point in Book I ) and server hypervisor improvements. The specification for Power ISA v.2.06 was released in February 2009, and revised in July 2010. It is based on Power ISA v.2.05 and includes extensions for
3240-610: Was released in December 2015, including the VSX-3 instructions, and also incorporates support for Nvidia 's NVLink bus technology. The United States Department of Energy together with Oak Ridge National Laboratory and Lawrence Livermore National Laboratory contracted IBM and Nvidia to build two supercomputers, the Sierra and the Summit, based on POWER9 processors coupled with Nvidia's Volta GPUs. The Sierra went online in 2017 and
3300-458: Was released in May 2013. It is based on Power ISA v.2.06 and includes major enhancements to logical partition functions , transactional memory , expanded performance monitoring, new storage control features, additions to the VMX and VSX vector facilities (VSX-2), along with AES and Galois Counter Mode (GCM), SHA-224, SHA-256, SHA-384 and SHA-512 ( SHA-2 ) cryptographic extensions and cyclic redundancy check (CRC) algorithms . The spec
3360-626: Was released in May 2020. Mainly giving support for new functions introduced in Power10, but also includes the notion of optionality to the PowerISA specification. Instructions can now be eight bytes long, "prefixed instructions", compared to the usual four byte "word instructions". A lot of new functions to SIMD and VSX instructions are also added. VSX and the SVP64 extension provide hardware support for 16-bit half precision floats. One key benefit of
3420-519: Was revised in April 2015 to the Power ISA v.2.07 B spec. The specification for Power ISA v.3.0 was released in November 2015. It is the first to come out after the founding of the OpenPOWER Foundation and includes enhancements for a broad spectrum of workloads and removes the server and embedded categories while retaining backwards compatibility and adds support for VSX-3 instructions. New functions include 128-bit quad-precision floating-point operations,
3480-434: Was the first to supply AltiVec enabled processors starting with their G4 line. AltiVec was also used in some embedded systems for high-performance digital signal processing. IBM consistently left VMX out of their earlier POWER microprocessors , which were intended for server applications where it was not very useful. The POWER6 microprocessor, introduced in 2007, implements AltiVec. The last desktop microprocessor from IBM,
3540-492: Was the fruit of the ambitious eCLipz Project , joining the I (AS/400), P (RS/6000) and Z (Mainframe) instruction sets under one common platform. I and P was already joined with the POWER4, but the eCLipz effort failed to include the CISC based z/Architecture and where the z10 processor became POWER6's eCLipz sibling. As of 2021 , a separate line of processors implementing z/Architecture continue to be developed by IBM, with
3600-473: Was the primary customer for Power Vector Media Extension (VMX) until Apple switched to Intel-made, x86-based CPUs on June 6, 2005. They used it to accelerate multimedia applications such as QuickTime , iTunes and key parts of Apple's Mac OS X including in the Quartz graphics compositor . Other companies such as Adobe used AltiVec to optimize their image-processing programs such as Adobe Photoshop . Motorola
#739260