100-592: The SPARC64 V ( Zeus ) is a SPARC V9 microprocessor designed by Fujitsu . The SPARC64 V was the basis for a series of successive processors designed for servers, and later, supercomputers. The servers series are the SPARC64 V+, VI, VI+, VII, VII+, X, X+ and XII. The SPARC64 VI and its successors up to the VII+ were used in the Fujitsu and Sun (later Oracle ) SPARC Enterprise M-Series servers. In addition to servers,
200-478: A register–register architecture ); except for the load/store instructions used to access memory , all instructions operate on the registers, in accordance with the RISC design principles. A SPARC processor includes an integer unit (IU) that performs integer load, store, and arithmetic operations. It may include a floating-point unit (FPU) that performs floating-point operations and, for SPARC V8, may include
300-774: A 1.89 GHz version, was shipped in September 2004 in the Fujitsu PrimePower 650 and 850. In December 2004, a 1.82 GHz version was shipped in the PrimePower 2500. These versions have a 3 MB L2 cache. In February 2006, four versions were introduced: 1.65 and 1.98 GHz versions with 3 MB L2 caches shipped in the PrimePower 250 and 450; and 2.08 and 2.16 GHz versions with 4 MB L2 caches shipped in mid-range and high-end models. It contained approximately 400 million transistors on an 18.46 mm by 15.94 mm die for an area of 294.25 mm. It
400-523: A 12 MB L2 unified cache. The division of the cores into CMGs enabled 34 cores to be integrated on a single die by easing the implementation of cache coherence and avoiding the need for the L2 cache to be shared between 34 cores. The two CMGs share the memory through a ccNUMA organization. The XIfx core was based on the SPARC64 X+ with organizational improvements. The XIfx implements an improved version of
500-620: A 128-bit write bus. The SPARC64 VI has a new system bus, the Jupiter Bus. The SPARC64 VI consisted of 540 million transistors. The die measures 20.38 mm by 20.67 mm (421.25 mm). The SPARC64 VI was originally to have been introduced in mid-2004 in Fujitsu's PrimePower servers. Development of the PrimerPowers were canceled after Fujitsu and Sun Microsystems announced in June 2004 that they would collaborate on new servers called
600-526: A 2.86 GHz version with two or four cores and a 5.5 MB L2 cache was announced for the low-end M3000. The VII+ is socket-compatible with its predecessor, the VII. Existing high-end SPARC Enterprise M-Series servers are able to upgrade to the VII+ processors in the field. The SPARC64 VIIIfx ( Venus ) is an eight-core processor based on the SPARC64 VII designed for high-performance computing (HPC). As
700-402: A 32-CPU server with up to 48 TB of memory. SPARC SPARC ( Scalable Processor ARChitecture ) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems . Its design was strongly influenced by the experimental Berkeley RISC system developed in the early 1980s. First developed in 1986 and released in 1987, SPARC was one of
800-506: A 48-entry instruction buffer. In the next stage, four instructions are taken from this buffer, decoded and issued to the appropriate reserve stations. The SPARC64 V has six reserve stations, two that serve the integer units, one for the address generators, two for the floating-point units, and one for branch instructions. Each integer, address generator and floating-point unit has an eight-entry reserve station. Each reserve station can dispatch an instruction to its execution unit. Which instruction
900-470: A 512 MB 16-way L2 cache. SPARC64 XII is Fujitsu's first SPARC CPU with L3 cache (32 MB 16-way). The number of 8-lane PCIe 3.0 ports has been doubled to 4 per chip. Memory speed has been increased by 50% to 2400 MT/s, bringing the theoretical combined bandwidth of the 8 DDR4 channels of the chip to 153 GB/s, and the capacity per CPU is up to 1.5 TB across 24 slots. Two CPUs can be connected in a Building Block, and up to 16 Building Blocks can be connected to create
1000-411: A 64-bit result, SDIVX , which divides a 64-bit signed dividend by a 64-bit signed divisor and produces a 64-bit signed quotient, and UDIVX , which divides a 64-bit unsigned dividend by a 64-bit unsigned divisor and produces a 64-bit signed quotient; none of those instructions use the Y register. Conditional branches test condition codes in a status register , as seen in many instruction sets such
1100-505: A Fujitsu design. The first Fujitsu SPARC64 Vs were fabricated in December 2001. They operated at 1.1 to 1.35 GHz. Fujitsu's 2003 SPARC64 roadmap showed that the company planned a 1.62 GHz version for release in late 2003 or early 2004, but it was canceled in favor of the SPARC64 V+. The SPARC64 V was used by Fujitsu in their PRIMEPOWER servers. The SPARC64 V was first presented at Microprocessor Forum 2002. At introduction, it had
SECTION 10
#17327807385501200-513: A Fujitsu-designed extension to the SPARC V9 architecture. The front-end had coarse-grained multi-threading removed, the L1 instruction cache halved in size to 32 KB; and the number of branch target address cache (BTAC) entries reduced to 1,024 from 8,192, and its associativity reduced to two from eight; and an extra pipeline stage was inserted before the instruction decoder. This stage accommodated
1300-415: A co-processor (CP) that performs co-processor-specific operations; the architecture does not specify what functions a co-processor would perform, other than load and store operations. The SPARC architecture has an overlapping register window scheme. At any instant, 32 general-purpose registers are visible. A Current Window Pointer ( CWP ) variable in the hardware points to the current set. The total size of
1400-429: A data cache. The second level consists of an on-die unified cache. The level 1 (L1) caches each have a capacity of 128 KB. They are both two-way set associative and have 64-byte line size. They are virtually indexed and physically tagged. The instruction cache is accessed via a 256-bit bus. The data cache is accessed with two 128-bit buses. The data cache consists of eight banks separated by 32-bit boundaries. It uses
1500-591: A new specification, Oracle SPARC Architecture 2011 , which besides the overall update of the reference, adds the VIS 3 instruction set extensions and hyperprivileged mode to the 2007 specification. In October 2015, Oracle released SPARC M7, the first processor based on the new Oracle SPARC Architecture 2015 specification. This revision includes VIS 4 instruction set extensions and hardware-assisted encryption and silicon secured memory (SSM). SPARC architecture has provided continuous application binary compatibility from
1600-528: A number of improvements that were part of the SuperSPARC series of processors released in 1992. SPARC V9, released in 1993, introduced a 64-bit architecture and was first released in Sun's UltraSPARC processors in 1995. Later, SPARC processors were used in symmetric multiprocessing (SMP) and non-uniform memory access ( CC-NUMA ) servers produced by Sun, Solbourne , and Fujitsu , among others. The design
1700-553: A peak performance of 1.1 TFLOPS. It consists of 3.75 billion transistors and is fabricated by the Taiwan Semiconductor Manufacturing Company in its 20 nm high-κ metal gate (HKMG) process. The Microprocessor Report estimated the die to have an area of 500 mm; and a typical power consumption of 200 W. The XIfx has 34 cores, 32 of which are compute cores used to run user applications, and 2 assistant cores used to run
1800-401: A plus sign separating the operands, instead of using a comma-separated list. Examples: Due to the widespread use of non-32-bit data, such as 16-bit or 8-bit integral data or 8-bit bytes in strings, there are instructions that load and store 16-bit half-words and 8-bit bytes, as well as instructions that load 32-bit words. During a load, those instructions will read only the byte or half-word at
1900-506: A power efficiency of more than 2 GFLOPS per watt. It consisted of 1 billion transistors and was implemented in a 40 nm CMOS process with copper interconnects. The SPARC64 X is a 16-core server microprocessor announced in 2012 and used in Fujitsu's M10 servers (which are also marketed by Oracle). The SPARC64 X is based on the SPARC64 VII+ with significant enhancements to its core and chip organization. The cores were improved by
2000-451: A quad-aligned group of four floating-point registers can hold one quad-precision IEEE 754 floating-point number. A SPARC V9 processor with an FPU includes: The registers are organized as a set of 64 32-bit registers, with the first 32 being used as the 32-bit floating-point registers, even–odd pairs of all 64 registers being used as the 64-bit floating-point registers, and quad-aligned groups of four floating-point registers being used as
2100-628: A rate of almost one instruction per clock cycle . This made them similar to the MIPS architecture in many ways, including the lack of instructions such as multiply or divide. Another feature of SPARC influenced by this early RISC movement is the branch delay slot . The SPARC processor usually contains as many as 160 general-purpose registers . According to the "Oracle SPARC Architecture 2015" specification an "implementation may contain from 72 to 640 general-purpose 64-bit" registers. At any point, only 32 of them are immediately visible to software — 8 are
SECTION 20
#17327807385502200-423: A register holds the value 10 and then branch to code that handles it, one would: In a conditional branch instruction, the icc or fcc field specifies the condition being tested. The 22-bit displacement field is the address, relative to the current PC, of the target, in words, so that conditional branches can go forward or backward up to 8 megabytes. The ANNUL (A) bit is used to get rid of some delay slots. If it
2300-523: A register or a 13-bit signed integer constant; the other operands are registers. Any of the register operands may point to G0; pointing the result to G0 discards the results, which can be used for tests. Examples include: The list of mathematical instructions is ADD , SUB , AND , OR , XOR , and negated versions ANDN , ORN , and XNOR . One quirk of the SPARC design is that most arithmetic instructions come in pairs, with one version setting
2400-607: A result, the VIIIfx did not succeed the VII, but existed concurrently with it. It consists of 760 million transistors, measures 22.7 mm by 22.6 (513.02 mm;), is fabricated in Fujitu's 45 nm CMOS process with copper interconnects, and has 1,271 I/O pins. The VIIIfx has a peak performance at 2 GHz of 128 GFLOPS and a typical power consumption of 58 W at 30 °C for an efficiency of 2.2 GFLOPS/W. The VIIIfx has four integrated memory controllers for
2500-498: A set of global registers (one of which, g0 , is hard-wired to zero, so only seven of them are usable as registers) and the other 24 are from the stack of registers. These 24 registers form what is called a register window , and at function call/return, this window is moved up and down the register stack. Each window has eight local registers and shares eight registers with each of the adjacent windows. The shared registers are used for passing function parameters and returning values, and
2600-482: A shift unit, but only EXA has multiply and divide units. Loads and stores are executed by two address generators (AGs) designated AGA and AGB. These are simple ALUs used to calculate virtual addresses. The two floating-point units (FPUs) are designated FLA and FLB. Each FPU contains an adder and a multiplier, but only FLA has a graphics unit attached. They execute add, subtract, multiply, divide, square root and multiply–add instructions. Unlike its successor SPARC64 VI ,
2700-499: A small but very fast 8 KB L1 data cache, and separate L2 caches for instructions and data. It was designed in Fujitsu's CS85 process, a 0.17 μm CMOS process with six levels of copper interconnect; and would have consisted of 65 million transistors on a 380 mm die. Originally scheduled for a late 2001 release in Fujitsu GranPower servers, it was canceled in mid-2001 when HAL was closed by Fujitsu, and replaced by
2800-548: A subset of the eight register windows, the previous, current and next register windows. Its purpose is reduce the size of register file so that the microprocessor can operate at higher clock frequencies. The floating-point register file contains 64 entries and has six read ports and two write ports. Execution begins during stage nine. There are six execution units, two for integer, two for loads and stores, and two for floating-point. The two integer execution units are designated EXA and EXB. Both have an arithmetic logic unit (ALU) and
2900-660: A total of eight memory channels . It connects to 64 GB of DDR3 SDRAM and has a peak memory bandwidth of 64 GB/s. The VIIIfx was developed for the Next-Generation Supercomputer Project (also called Kei Soku Keisenki and Project Keisoku) initiated by Japan's Ministry of Education, Culture, Sports, Science and Technology in January 2006. The project aimed to produce the world's fastest supercomputer with performance of over 10 PFLOPS by March 2011. The companies contracted to develop
3000-639: A version of the SPARC64 VII was also used in the commercially available Fujitsu FX1 supercomputer. As of October 2017, the SPARC64 XII is the latest server processor, and it is used in the Fujitsu and Oracle M12 servers. The supercomputer series was based on the SPARC64 VII, and are the SPARC64 VIIfx, IXfx, and XIfx. The SPARC64 VIIIfx was used in the K computer , and the SPARC64 IXfx in the commercially available PRIMEHPC FX10 . As of July 2016,
3100-546: A write-back policy. The data cache writes to the L2 cache with its own 128-bit unidirectional bus. The second level cache has a capacity of 1 or 2 MB and the set associativity depends on the capacity. The microprocessor has a 128-bit system bus that operates at 260 MHz. The bus can operate in two modes, single-data rate (SDR) or double-data (DDR) rate, yielding a peak bandwidth of 4.16 or 8.32 GB/s, respectively. The SPARC64 V consisted of 191 million transistors, of which 19 million are contained in logic circuits. It
SPARC64 V - Misplaced Pages Continue
3200-560: A year later for their mainframe and end-of-support in 2034 "to promote customer modernization". The SPARC architecture was heavily influenced by the earlier RISC designs, including the RISC I and II from the University of California, Berkeley and the IBM 801 . These original RISC designs were minimalist, including as few features or op-codes as possible and aiming to execute instructions at
3300-725: Is 0 in a conditional branch, the delay slot is executed as usual. If it is 1, the delay slot is only executed if the branch is taken. If it is not taken, the instruction following the conditional branch is skipped. There are a wide variety of conditional branches: BA (branch always, essentially a jmp), BN (branch never), BE (equals), BNE (not equals), BL (less than), BLE (less or equal), BLEU (less or equal, unsigned), BG (greater), BGE (greater or equal), BGU (greater unsigned), BPOS (positive), BNEG (negative), BCC (carry clear), BCS (carry set), BVC (overflow clear), BVS (overflow set). The FPU and CP have sets of condition codes separate from
3400-404: Is 25.8mm × 30.8mm (795mm), containing 5.45 billion transistors made on TSMC 's 20 nm process . Each of the two pipelines of a core can fetch 8 instructions, decode 4 instructions and execute 6 instructions per cycle, and supports 4 SMT threads (for 96 threads per CPU). Each pipeline has an own 32 MB 4-way L1 data cache, and two pipelines share a 64 MB 4-way associative L1 instruction cache and
3500-400: Is a modified SPARC64 V+ processor. One of the main improvements is the addition of two-way coarse-grained multi-threading (CMT), which Fujitsu called vertical multi-threading (VMT). In CMT, which thread is executed is determined by time-sharing, or if the thread is executing a long-latency operation, then execution is switched to the other thread. The addition of CMT required duplication of
3600-500: Is connected to two HMCs via its own ports. Each CMG has 240 GB/s (120 GB/s in and 120 GB/s out) of memory bandwidth. The XIfx replaced the ten SERDES channels to an external Tofu interconnect controller with a ten-port integrated controller for the second-generation Tofu2 interconnect. Tofu2 is a 6D mesh/torus network with a 25 GB/s full-duplex bandwidth (12.5 GB/s per direction, 125 GB/s for ten ports) and an improved routing architecture. Fujitsu announced at
3700-437: Is created by adding the two address operands to produce an address. The second address operand may be a constant or a register. Loads take the value at the address and place it in the register specified by the third operand, whereas stores take the value in the register specified by the first operand and place it at the address. To make this more obvious, the assembler language indicates address operands using square brackets with
3800-733: Is dispatched firstly depends on operand availability and then its age. Older instructions are given higher priority than newer ones. The reserve stations can dispatch instructions speculatively (speculative dispatch). That is, instructions can be dispatched to the execution units even when their operands are not yet available but will be when execution begins. During stage six, up to six instructions are dispatched. The register files are read during stage seven. The SPARC architecture has separate register files for integer and floating-point instructions. The integer register file has eight register windows. The JWR (Joint Work Register) contains 64 entries and has eight read ports and two write ports. The JWR contains
3900-401: Is fabricated in a 28 nm CMOS process with copper interconnects. The SPARC64 X+ is an enhanced SPARC64 X processor announced in 2013. It features minor improvements to the core organization, and a higher 3.5 GHz clock frequency obtained through better circuit design and layout. It contained 2.99 billion transistors, measured 24 mm by 25 mm (600 mm), and is fabricated in
4000-536: Is fully open, non-proprietary and royalty-free. As of 2024, the latest commercial high-end SPARC processors are Fujitsu 's SPARC64 XII (introduced in September 2017 for its SPARC M12 server) and Oracle 's SPARC M8 introduced in September 2017 for its high-end servers. On September 1, 2017, after a round of layoffs that started in Oracle Labs in November 2016, Oracle terminated SPARC design after completing
4100-564: Is implemented by reading three operands from the operand register, multiplying two of the operands, forwarding the result and the third operand to the adder, and adding them to produce the final result. Results from the execution units and loads are not written to the register file. To maintain program order, they are written to update buffers, where they reside until committed. The SPARC64 V has separate update buffers for integer and floating-point units. Both have 32 entries each. The integer register has eight read ports and four write ports. Half of
SPARC64 V - Misplaced Pages Continue
4200-490: The LD instruction, renamed LDUW , clears the upper 32 bits in the register and loads the 32-bit value into the lower 32 bits, and the ST instruction, renamed STW , discards the upper 32 bits of the register and stores only the lower 32 bits. The new LDSW instruction sets the upper bits in the register to the value of the uppermost bit of the word and loads the 32-bit value into
4300-511: The IBM System/360 architecture and successors and the x86 architecture. This means that a test and branch is normally performed with two instructions; the first is an ALU instruction that sets the condition codes, followed by a branch instruction that examines one of those flags. The SPARC does not have specialized test instructions; tests are performed using normal ALU instructions with the destination set to %G0. For instance, to test if
4400-655: The International Supercomputing Conference in June 2016 that its future exascale supercomputer will feature processors of its own design that implement the ARMv8 architecture. The A64FX will implement extensions to the ARMv8 architecture, equivalent to HPC-ACE2, that Fujitsu is developing with ARM Holdings . SPARC64 XII was launched in 2017 with Fujitsu's SPARC M12 servers. It nominally features 12 cores, but just like IBM's POWER9 that
4500-485: The M3 by Oracle, is a further development of the SPARC64 VII. The clock frequency was increased up to 3 GHz and the L2 cache size was doubled to 12 MB. This version was announced on 2 December 2010 for the high-end SPARC Enterprise M8000 and M9000 servers. These improvements resulted in an approximately 20% increase to overall performance. A 2.66 GHz version was for mid-range M4000 and M5000 models. On 12 April 2011,
4600-565: The TOP500 Project Committee announced that the K computer (still incomplete with only 68,544 processors) topped the LINPACK benchmark at 8.162 PFLOPS , realizing 93% of its peak performance, making it the fastest supercomputer in the world at that time. The VIIIfx core is based on that of the SPARC64 VII with numerous modifications for HPC, namely High Performance Computing-Arithmetic Computational Extensions (HPC-ACE)
4700-691: The UltraSPARC T1 implementation: In 2007, Sun released an updated specification, UltraSPARC Architecture 2007 , to which the UltraSPARC T2 implementation complied. In December 2007, Sun also made the UltraSPARC T2 processor's RTL available via the OpenSPARC project. It was also released under the GNU General public license v2. OpenSPARC T2 is 8 cores, 16 pipelines with 64 threads. In August 2012, Oracle Corporation made available
4800-401: The 128-bit floating-point registers. Floating-point registers are not windowed; they are all global registers. All SPARC instructions occupy a full 32-bit word and start on a word boundary. Four formats are used, distinguished by the first two bits. All arithmetic and logical instructions have 2 source operands and 1 destination operand. RD is the "destination register", where the output of
4900-620: The Advanced Product Line (APL). These servers were scheduled to be introduced in mid-2006, but were delayed until April 2007, when they were introduced as the SPARC Enterprise . The SPARC64 VI processors featured in the SPARC Enterprise at its announcement were a 2.15 GHz version with a 5 MB L2 cache, and 2.28 and 2.4 GHz versions with 6 MB L2 caches. The SPARC64 VII (previously called
5000-662: The HPC-ACE extensions (HPC-ACE2), which doubled the width of the SIMD units to 256 bits and added new SIMD instructions. Compared to the SPARC64 IXfx, the XIfx has an improvement of a factor of 3.2 for double precision and 6.1 for single precision. To complement the increased width of the SIMD units, the L1 cache bandwidth was increased to 4.4 TB/s. Improvements to the SoC organization were to
5100-528: The L1 data cache was halved in size to 32 KB. The number of commit stack entries, which determined the number of instructions that could be in-flight in the back-end, was reduced to 48 from 64. The SPARC64 IXfx is an improved version of the SPARC64 VIIIfx designed by Fujitsu and LSI first revealed in the announcement of the PRIMEHPC FX10 supercomputer on 7 November 2011. It, along with
SECTION 50
#17327807385505200-611: The M8. Much of the processor core development group in Austin, Texas, was dismissed, as were the teams in Santa Clara, California, and Burlington, Massachusetts. Fujitsu will also discontinue their SPARC production (has already shifted to producing their own ARM -based CPUs), after two "enhanced" versions of Fujitsu's older SPARC M12 server in 2020–22 (formerly planned for 2021) and again in 2026–27, end-of-sale in 2029, of UNIX servers and
5300-453: The NZVC condition code bits in the status register , and the other not setting them, with the default being not to set the codes. This is so that the compiler has a way to move instructions around when trying to fill delay slots. If one wants the condition codes to be set, this is indicated by adding cc to the instruction: add and sub also have another modifier, X, which indicates whether
5400-504: The PRIMEHPC FX10, is a commercialization of the technologies that first appeared in the VIIIfx and K computer. Compared to the VIIIfx, organizational improvements included doubling the number of cores was to 16, doubling the amount of shared L2 cache to 12 MB, and increasing peak DDR3 SDRAM memory bandwidth to 85 GB/s. The IXfx operates at 1.848 GHz, has a peak performance of 236.5 GFLOPS, and consumes 110 W for
5500-635: The RESTORE instruction (switching back to the call before returning from the procedure). Trap events ( interrupts , exceptions or TRAP instructions) and RETT instructions (returning from traps) also change the CWP . For SPARC V9, CWP register is decremented during a RESTORE instruction, and incremented during a SAVE instruction. This is the opposite of PSR.CWP's behavior in SPARC V8. This change has no effect on nonprivileged instructions. SPARC registers are shown in
5600-458: The SPARC64 V performs the multiply–add with separate multiplication and addition operations, thus with up to two rounding errors. The graphics unit executes Visual Instruction Set (VIS) instructions, a set of single instruction, multiple data (SIMD) instructions. All instructions are pipelined except for divide and square root, which are executed using iterative algorithms. The FMA instruction
5700-579: The SPARC64 VI+), code-named Jupiter , is a further development of the SPARC64 VI announced in July 2008. It is a quad-core microprocessor. Each core is capable of two-way simultaneous multithreading (SMT), which replaces two-way coarse-grained multithreading , termed vertical multithreading (VMT) by Fujitsu. Thus, it can execute eight threads simultaneously. Other changes include more RAS features;
5800-480: The SPARC64 VI, and is field-upgradeable. SPARC64 VIIs could coexist, whilst operating at their native clock frequency, alongside SPARC64 VIs. The first versions of the SPARC64 VII were a 2.4 GHz version with a 5 MB L2 cache used in the SPARC Enterprise M4000 and M5000, and a 2.52 GHz version with a 6 MB L2 cache. On 28 October 2008, a 2.52 GHz version with a 5 MB L2 cache
5900-698: The SPARC64 XIfx is the latest supercomputer processor, and it is used in the Fujitsu PRIMEHPC FX100 supercomputer. In the late 1990s, HAL Computer Systems , a subsidiary of Fujitsu, was designing a successor to the SPARC64 GP as the SPARC64 V. First announced at Microprocessor Forum 1999, the HAL SPARC64 V would have operated 1 GHz and had a wide superscalar organization with superspeculation , an L1 instruction trace cache ,
6000-469: The UltraSPARC IV by Sun and the SPARC64 VI by Fujitsu. In early 2006, Sun released an extended architecture specification, UltraSPARC Architecture 2005 . This includes not only the non-privileged and most of the privileged portions of SPARC V9, but also all the architectural extensions developed through the processor generations of UltraSPARC III, IV, and IV+, as well as CMT extensions starting with
6100-415: The application instruction ( load–store ) level or at the memory page level (via an MMU setting). The latter is often used for accessing data from inherently little-endian devices, such as those on PCI buses. There have been three major revisions of the architecture. The first published version was the 32-bit SPARC version 7 (V7) in 1986. SPARC version 8 (V8), an enhanced SPARC architecture definition,
SECTION 60
#17327807385506200-451: The bottom two bits of both operands are 0 and reporting overflow if they are not. This can be useful in the implementation of the run time for ML , Lisp , and similar languages that might use a tagged integer format. The endianness of the 32-bit SPARC V8 architecture is purely big-endian. The 64-bit SPARC V9 architecture uses big-endian instructions, but can access data in either big-endian or little-endian byte order, chosen either at
6300-455: The completion of memory references. For example, all effects of the stores that appear prior to the MEMBAR instruction must be made visible to all processors before any loads following the MEMBAR can be executed. Arithmetic and logical instructions also use a three-operand format, with the first two being the operands and the last being the location to store the result. The middle operand can be
6400-522: The condition codes and versions that do. MULSCC and the multiply instructions use the Y register to hold the upper 32 bits of the product; the divide instructions use it to hold the upper 32 bits of the dividend. The RDY instruction reads the value of the Y register into a general-purpose register; the WRY instruction writes the value of a general-purpose register to the Y register. SPARC V9 added MULX , which multiplies two 64-bit values and produces
6500-410: The figure above. There is also a non-windowed Y register, used by the multiply-step, integer multiply, and integer divide instructions. A SPARC V8 processor with an FPU includes 32 32-bit floating-point registers, each of which can hold one single-precision IEEE 754 floating-point number. An even–odd pair of floating-point registers can hold one double-precision IEEE 754 floating-point number, and
6600-552: The first SPARC V7 implementation in 1987 through the Sun UltraSPARC Architecture implementations. Among various implementations of SPARC, Sun's SuperSPARC and UltraSPARC-I were very popular, and were used as reference systems for SPEC CPU95 and CPU2000 benchmarks. The 296 MHz UltraSPARC-II is the reference system for the SPEC CPU2006 benchmark. SPARC is a load–store architecture (also known as
6700-442: The greater number of integer and floating-point registers defined by HPC-ACE. The SPARC V9 architecture was designed to have only 32 integer and 32 floating-point number registers. The SPARC V9 instruction encoding limited the number of registers specifiable to 32. To specify the extra registers, HPC-ACE has a "prefix" instruction that would immediately follow one or two SPARC V9 instructions. The prefix instruction contained (primarily)
6800-478: The highest clock frequency of both SPARC and 64-bit server processors in production; and the highest SPEC rating of any SPARC processor. The SPARC64 V is a four-issue superscalar microprocessor with out-of-order execution . It was based on the Fujitsu GS8900 mainframe microprocessor. The SPARC64 V fetches up to eight instructions from the instruction cache during the first stage and places them into
6900-876: The inclusion of a pattern history table for branch prediction , speculative execution of loads , more execution units, support for the HPC-ACE extension (originally from the SPARC64 VIIIfx), deeper pipeline for a 3.0 GHz clock frequency, and accelerators for cryptography , database , and decimal floating-point number arithmetic and conversion functions. The 16 cores share a unified, 24 MB, 24-way set-associative L2 cache. Chip organization improvements include four integrated DDR3 SDRAM memory controllers, glueless four-way symmetrical multiprocessing, ten SERDES channels for symmetrical multiprocessing scalability to 64 sockets, and two integrated PCI Express 3.0 controllers. The SPARC64 X contains 2.95 billion transistors, measures 23.5 mm by 25 mm (587.5 mm), and
7000-424: The indicated location and then either fill the rest of the target register with zeros (unsigned load) or with the value of the uppermost bit of the byte or half-word (signed load). During a store, those instructions discard the upper bits in the register and store only the lower bits. There are also instructions for loading double-precision values used for floating-point arithmetic , reading or writing eight bytes from
7100-470: The indicated register and the "next" one, so if the destination of a load is L1, L1 and L2 will be set. The complete list of load and store instructions for the general-purpose registers in 32-bit SPARC is LD , ST , LDUB (unsigned byte), LDSB (signed byte), LDUH (unsigned half-word), LDSH (signed half-word), LDD (load double), STB (store byte), STH (store half-word), STD (store double). In SPARC V9, registers are 64-bit, and
7200-485: The integer and floating-point register files had registers added to them: the integer register file gained 32, and there were a total of 256 floating-point registers. The extra integer registers are not part of the register windows defined by SPARC V9, but are always accessible via the prefix instruction; and the 256 floating-point registers could be used by both scalar floating-point instructions and by both integer and floating-point SIMD instructions. An extra pipeline stage
7300-495: The integer condition codes and from each other; two additional sets of branch instructions were defined to test those condition codes. Adding an F to the front of the branch instruction in the list above performs the test against the FPU's condition codes, while, in SPARC V8, adding a C tests the flags in the otherwise undefined CP. The CALL (jump to subroutine) instruction uses a 30-bit program counter -relative word offset. As
7400-414: The integer register file is now protected by ECC, and the number of error checkers has been increased to around 3,400. It consists of 600 million transistors, is 21.31 mm × 20.86 mm (444.63 mm) large, and is fabricated by Fujitsu in its 65 nm CMOS, copper interconnect process. The SPARC64 VII was featured in the SPARC Enterprise . It is socket-compatible with its predecessor,
7500-405: The local registers are used for retaining local values across function calls. The "scalable" in SPARC comes from the fact that the SPARC specification allows implementations to scale from embedded processors up through large server processors, all sharing the same core (non-privileged) instruction set. One of the architectural parameters that can scale is the number of implemented register windows;
7600-722: The lower bits. The new LDX instruction loads a 64-bit value into the register, and the STX instruction stores all 64 bits of the register. The LDF , LDDF , and LDQF instructions load a single-precision, double-precision, or quad-precision value from memory into a floating-point register; the STF , STDF , and STQF instructions store a single-precision, double-precision, or quad-precision floating-point register into memory. The memory barrier instruction, MEMBAR, serves two interrelated purposes: it articulates order constraints among memory references and facilitates explicit control over
7700-619: The memory and interconnect interfaces. The integrated memory controllers were replaced with four Hybrid Memory Cube (HMC) interfaces for decreased memory latency and improved memory bandwidth. According to the Microprocessor Report , the IXfx was the first processor to use HMCs. The XIfx is connected to 32 GB of memory provided by eight 4 GB HMCs. The HMCs are 16-lane versions, with each lane operating at 15 Gbit/s. Each CMG has two HMC interfaces, and each HMC interface
7800-404: The microprocessor to enable it to operate at 1.35 GHz. The increased power supply voltage and operating frequency increased the power dissipation to ~45 W. The SPARC64 V+ , code-named "Olympus-B", is a further development of the SPARC64 V. Improvements over the SPARC64 V included higher clock frequencies of 1.82–2.16 GHz and a larger 3 or 4 MB L2 cache. The first SPARC64 V+,
7900-470: The most successful early commercial RISC systems, and its success led to the introduction of similar RISC designs from many vendors through the 1980s and 1990s. The first implementation of the original 32-bit architecture (SPARC V7) was used in Sun's Sun-4 computer workstation and server systems, replacing their earlier Sun-3 systems based on the Motorola 68000 series of processors. SPARC V8 added
8000-416: The operating system and other system services. The delegation of user applications and operating system to dedicated cores improves performance by ensuring that the private caches of the compute cores are not shared with or disrupted by non-application instructions and data. The 34 cores are further organized into two Core Memory Groups ( CMGs ), each consisting of 16 compute cores and 1 assistant core sharing
8100-443: The operation is deposited. The majority of SPARC instructions have at least this register, so it is placed near the "front" of the instruction format. RS1 and RS2 are the "source registers", which may or may not be present, or replaced by a constant. Load and store instructions have a three-operand format, in that they have two operands representing values for the address and one operand for the register to read or write to. The address
8200-530: The operation should set the carry bit: SPARC V7 does not have multiplication or division instructions, but it does have MULSCC , which does one step of a multiplication testing one bit and conditionally adding the multiplicand to the product. This was because MULSCC can complete over one clock cycle in keeping with the RISC philosophy. SPARC V8 added UMUL (unsigned multiply), SMUL (signed multiply), UDIV (unsigned divide), and SDIV (signed divide) instructions, with both versions that do not update
8300-468: The portions of the register numbers that could not fit within a SPARC V9 instruction. This extra pipeline stage was where up to four SPARC V9 instructions were combined with up to two prefix instructions in the preceding stage. The combined instructions were then decoded in the next pipeline stage. The back-end was also heavily modified. The number of reservation station entries for branch and integer instructions were reduced to six and ten, respectively. Both
8400-421: The program counter and the control, integer, and floating-point registers so there is one set of each for every thread. A floating-point fused multiply-add (FMA) instruction was also added, the first SPARC processor to do so. The cores share a 6 MB on-die unified L2 cache. The L2 cache is 12-way set associative and has 256-byte lines. The cache is accessed via two unidirectional buses, a 256-bit read bus and
8500-570: The project because manufacturing the hardware they were responsible for would result in financial losses for them. Afterwards, Fujitsu redesigned the supercomputer to use the VIIIfx as its only processor type. By 2010, the supercomputer that would be built by the project was named the K computer . Located at the RIKEN 's Advanced Institute for Computational Science (AICS) in Kobe , Japan; it obtains its performance from 88,128 VIIIfx processors. In June 2011,
8600-584: The register file is not part of the architecture, allowing more registers to be added as the technology improves, up to a maximum of 32 windows in SPARC V7 and V8 as CWP is 5 bits and is part of the PSR register. In SPARC V7 and V8 CWP will usually be decremented by the SAVE instruction (used by the SAVE instruction during the procedure call to open a new stack frame and switch the register window), or incremented by
8700-675: The same process as the SPARC64 X. On 8 April 2014, 3.7 GHz speed-binned parts became available in response to the introduction of new Xeon E5 and E7 models by Intel ; and the impending introduction of the POWER8 by IBM . Fujitsu introduced the SPARC64 XIfx in August 2014 at the Hot Chips symposium. It is used in the Fujitsu PRIMEHPC FX100 supercomputer, which succeeded the PRIMEHPC FX10 . The XIfx operates at 2.2 GHz and has
8800-562: The specification allows from three to 32 windows to be implemented, so the implementation can choose to implement all 32 to provide maximum call stack efficiency, or to implement only three to reduce cost and complexity of the design, or to implement some number between them. Other architectures that include similar register file features include Intel i960 , IA-64 , and AMD 29000 . The architecture has gone through several revisions. It gained hardware multiply and divide functionality in version 8. 64-bit (addressing and data) were added to
8900-539: The supercomputer were Fujitsu, Hitachi , and NEC . The supercomputer was originally envisioned to have a hybrid architecture containing scalar and vector processors . The Fujitsu-designed VIIIfx was to have been the scalar processor, with the vector processor to have been jointly designed by Hitachi and NEC. However, due to the Financial crisis of 2007–2008 , Hitachi and NEC announced in May 2009 that they would leave
9000-767: The version 9 SPARC specification published in 1994. In SPARC version 8, the floating-point register file has 16 double-precision registers. Each of them can be used as two single-precision registers, providing a total of 32 single-precision registers. An odd–even number pair of double-precision registers can be used as a quad-precision register, thus allowing 8 quad-precision registers. SPARC Version 9 added 16 more double-precision registers (which can also be accessed as 8 quad-precision registers), but these additional registers can not be accessed as single-precision registers. No SPARC CPU implements quad-precision operations in hardware as of 2024. Tagged add and subtract instructions perform adds and subtracts on values checking that
9100-517: The write ports are used for results from the integer execution units and the other half by data returned by loads. The floating-point update buffer has six read ports and four write ports. Commit takes place during stage ten at the earliest. The SPARC64 V can commit up to four instructions per cycle. During stage eleven, results are written to the register file, where it becomes visible to software. The SPARC64 V has two-level cache hierarchy. The first level consists of two caches, an instruction cache and
9200-404: Was fabricated in a 0.13 μm , eight-layer copper metallization, complementary metal–oxide–semiconductor (CMOS) silicon on insulator (SOI) process. The die measured 18.14 mm by 15.99 mm for a die area of 290 mm. At 1.3 GHz, the SPARC64 V has a power dissipation of 34.7 W. The Fujitsu PrimePower servers that use the SPARC64 V supply a slightly higher voltage
9300-424: Was added to the beginning of the floating-point execution pipeline to access the larger floating-point register file. The 128-bit SIMD instructions from HPC-ACE were implemented by adding two extra floating-point units for a total of four. SIMD execution can perform up four single- or double-precision fused-multiply-add operations (eight FLOPs) per cycle. The number of load queue entries was increased to 20 from 16, and
9400-466: Was fabricated in a 90 nm CMOS process with ten levels of copper interconnect . The SPARC64 VI , code-named Olympus-C, is a two-core processor (the first multi-core SPARC64 processor) which succeeded the SPARC64 V+ . It is fabricated by Fujitsu in a 90 nm, 10-layer copper, CMOS silicon on insulator (SOI) process, which enabled two cores and an L2 cache to be integrated on a die. Each core
9500-686: Was introduced in the SPARC Enterprise M3000. On 13 October 2009, Fujitsu and Sun introduced new versions of the SPARC64 VII (code-named Jupiter+ ), a 2.53 GHz version with a 5.5 MB L2 cache for the M4000 and M5000, and a 2.88 GHz version with a 6 MB L2 cache for the M8000 and M9000. On 12 January 2010, a 2.75 GHz version with a 5 MB L2 cache was introduced in the M3000. The SPARC64 VII+ ( Jupiter-E ), referred to as
9600-452: Was launched the same year, each of the twelve cores consists of two separate pipelines, and the only resources shared between SPARC64 XII core's pipelines are the TLB , L1 instruction cache and L2 cache, and as a result the single-threaded performance is almost unchanged from SPARC64 X. SPARC64 XII operates at up to 4.25 GHz base frequency and 4.35 GHz boost frequency. The size of the chip
9700-574: Was released by Fujitsu and Sun, describing processor functions which were identically implemented in the CPUs of both companies ("Commonality"). The first CPUs conforming to JPS1 were the UltraSPARC III by Sun and the SPARC64 V by Fujitsu. Functionalities which are not covered by JPS1 are documented for each processor in "Implementation Supplements". At the end of 2003, JPS2 was released to support multicore CPUs. The first CPUs conforming to JPS2 were
9800-587: Was released by SPARC International in 1993. It was developed by the SPARC Architecture Committee consisting of Amdahl Corporation , Fujitsu , ICL , LSI Logic , Matsushita , Philips , Ross Technology , Sun Microsystems , and Texas Instruments . Newer specifications always remain compliant with the full SPARC V9 Level 1 specification. In 2002, the SPARC Joint Programming Specification 1 (JPS1)
9900-412: Was released in 1990. The main differences between V7 and V8 were the addition of integer multiply and divide instructions, and an upgrade from 80-bit "extended-precision" floating-point arithmetic to 128-bit " quad-precision " arithmetic. SPARC V8 served as the basis for IEEE Standard 1754-1994, an IEEE standard for a 32-bit microprocessor architecture. SPARC version 9 , the 64-bit SPARC architecture,
10000-694: Was turned over to the SPARC International trade group in 1989, and since then its architecture has been developed by its members. SPARC International is also responsible for licensing and promoting the SPARC architecture, managing SPARC trademarks (including SPARC, which it owns), and providing conformance testing . SPARC International was intended to grow the SPARC architecture to create a larger ecosystem; SPARC has been licensed to several manufacturers, including Atmel , Bipolar Integrated Technology , Cypress Semiconductor , Fujitsu , Matsushita and Texas Instruments . Due to SPARC International, SPARC
#549450