The UltraSPARC T1 ( codenamed " Niagara ") is a multithreading , multicore CPU released by Sun Microsystems in 2005. Designed to lower the energy consumption of server computers , the CPU typically uses 72 W of power at 1.4 GHz.
87-530: The T1 is a new-from-the-ground-up SPARC microprocessor implementation that conforms to the UltraSPARC Architecture 2005 specification and executes the full SPARC V9 instruction set . Sun has produced two previous multicore processors ( UltraSPARC IV and IV+), but UltraSPARC T1 was its first microprocessor that is both multicore and multithreaded. Security was built-in from the very first release on silicon, with hardware cryptographic units in
174-478: A register–register architecture ); except for the load/store instructions used to access memory , all instructions operate on the registers, in accordance with the RISC design principles. A SPARC processor includes an integer unit (IU) that performs integer load, store, and arithmetic operations. It may include a floating-point unit (FPU) that performs floating-point operations and, for SPARC V8, may include
261-411: A 64-bit result, SDIVX , which divides a 64-bit signed dividend by a 64-bit signed divisor and produces a 64-bit signed quotient, and UDIVX , which divides a 64-bit unsigned dividend by a 64-bit unsigned divisor and produces a 64-bit signed quotient; none of those instructions use the Y register. Conditional branches test condition codes in a status register , as seen in many instruction sets such
348-415: A co-processor (CP) that performs co-processor-specific operations; the architecture does not specify what functions a co-processor would perform, other than load and store operations. The SPARC architecture has an overlapping register window scheme. At any instant, 32 general-purpose registers are visible. A Current Window Pointer ( CWP ) variable in the hardware points to the current set. The total size of
435-512: A finite number of operations it can support – for example, no FPUs directly support arbitrary-precision arithmetic . When a CPU is executing a program that calls for a floating-point operation that is not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on
522-889: A gate array to interface the ARM2 processor with the WE32206 to support the additional ARM floating-point instructions. Acorn later offered the FPA10 coprocessor, developed by ARM, for various machines fitted with the ARM3 processor. Coprocessors were available for the Motorola 68000 family , the 68881 and 68882 . These were common in Motorola 68020 / 68030 -based workstations , like the Sun-3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding
609-452: A large number of separate threads. One of the limitations of the T1 design is that a single floating point unit (FPU) is shared between all 8 cores, making the T1 unsuitable for applications performing a lot of floating point mathematics. However, since the processor's intended markets do not typically make much use of floating-point operations, Sun did not expect this to be a problem. Sun provides
696-422: A long-latency event occurs, such as cache miss, the thread is taken out of rotation while the data is fetched into cache in the background. Once the long-latency event completes, the thread is made available for execution again. Sharing of the pipeline by multiple threads may make each thread slower, but the overall throughput (and utilization) of each core is much higher. It also means that the impact of cache misses
783-591: A new specification, Oracle SPARC Architecture 2011 , which besides the overall update of the reference, adds the VIS 3 instruction set extensions and hyperprivileged mode to the 2007 specification. In October 2015, Oracle released SPARC M7, the first processor based on the new Oracle SPARC Architecture 2015 specification. This revision includes VIS 4 instruction set extensions and hardware-assisted encryption and silicon secured memory (SSM). SPARC architecture has provided continuous application binary compatibility from
870-528: A number of improvements that were part of the SuperSPARC series of processors released in 1992. SPARC V9, released in 1993, introduced a 64-bit architecture and was first released in Sun's UltraSPARC processors in 1995. Later, SPARC processors were used in symmetric multiprocessing (SMP) and non-uniform memory access ( CC-NUMA ) servers produced by Sun, Solbourne , and Fujitsu , among others. The design
957-401: A plus sign separating the operands, instead of using a comma-separated list. Examples: Due to the widespread use of non-32-bit data, such as 16-bit or 8-bit integral data or 8-bit bytes in strings, there are instructions that load and store 16-bit half-words and 8-bit bytes, as well as instructions that load 32-bit words. During a load, those instructions will read only the byte or half-word at
SECTION 10
#17327802095581044-503: A program that calls for a floating-point operation, there are three ways to carry it out: In 1954, the IBM 704 had floating-point arithmetic as a standard feature, one of its major improvements over its predecessor the IBM 701 . This was carried forward to its successors the 709, 7090, and 7094. In 1963, Digital announced the PDP-6 , which had floating point as a standard feature. In 1963,
1131-451: A quad-aligned group of four floating-point registers can hold one quad-precision IEEE 754 floating-point number. A SPARC V9 processor with an FPU includes: The registers are organized as a set of 64 32-bit registers, with the first 32 being used as the 32-bit floating-point registers, even–odd pairs of all 64 registers being used as the 64-bit floating-point registers, and quad-aligned groups of four floating-point registers being used as
1218-559: A range of open source applications, including MySQL , PHP , gzip , and ImageMagick . Proper optimization for CoolThreads systems can result in significant gains: when the Sun Studio compiler is used with the recommended optimization settings, MySQL performance improves by 268% compared to using just the -O3 flag. The "Coolthreads(TM)" architecture, beginning with the UltraSPARC T1 (with its positive and negative aspects),
1305-628: A rate of almost one instruction per clock cycle . This made them similar to the MIPS architecture in many ways, including the lack of instructions such as multiply or divide. Another feature of SPARC influenced by this early RISC movement is the branch delay slot . The SPARC processor usually contains as many as 160 general-purpose registers . According to the "Oracle SPARC Architecture 2015" specification an "implementation may contain from 72 to 640 general-purpose 64-bit" registers. At any point, only 32 of them are immediately visible to software — 8 are
1392-423: A register holds the value 10 and then branch to code that handles it, one would: In a conditional branch instruction, the icc or fcc field specifies the condition being tested. The 22-bit displacement field is the address, relative to the current PC, of the target, in words, so that conditional branches can go forward or backward up to 8 megabytes. The ANNUL (A) bit is used to get rid of some delay slots. If it
1479-523: A register or a 13-bit signed integer constant; the other operands are registers. Any of the register operands may point to G0; pointing the result to G0 discards the results, which can be used for tests. Examples include: The list of mathematical instructions is ADD , SUB , AND , OR , XOR , and negated versions ANDN , ORN , and XNOR . One quirk of the SPARC design is that most arithmetic instructions come in pairs, with one version setting
1566-597: A replacement for the UltraSPARC T1 or T2, but was canceled in the timeframe of Oracle's acquisition of Sun . Formerly known by the codename Niagara 2 , the follow-on to the UltraSPARC T1, the T2 provides eight cores. Unlike the T1, each core supports 8 threads per core, one FPU per core, one enhanced cryptographic unit per core, and CPU embedded 10 Gigabit Ethernet network controllers. In February 2007, Sun announced at its annual analyst summit that its third-generation simultaneous multithreading design, code-named Victoria Falls ,
1653-498: A set of global registers (one of which, g0 , is hard-wired to zero, so only seven of them are usable as registers) and the other 24 are from the stack of registers. These 24 registers form what is called a register window , and at function call/return, this window is moved up and down the register stack. Each window has eight local registers and shares eight registers with each of the adjacent windows. The shared registers are used for passing function parameters and returning values, and
1740-408: A single integrated circuit , an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating-point calculations are done in software, which takes more processor time, but avoids the cost of the extra hardware. For a particular computer architecture, the floating-point unit instructions may be emulated by a library of software functions; this may permit
1827-508: A single or group of processes and/or threads, while the other cores deal with the rest of the processes on the system. Afara Websystems pioneered a radical thread-heavy SPARC design. The company was purchased by Sun, and the intellectual property became the foundation of the CoolThreads line of processors, starting with the T1. The UltraSPARC T1 was designed from scratch as a multi-threaded, special-purpose processor, and thus introduced
SECTION 20
#17327802095581914-482: A sizable amount of cache . Single-thread processors depend heavily on large caches for their performance because cache misses result in a wait while the data is fetched from main memory. By making the cache larger, the probability of a cache miss is reduced, but the impact of a miss is still the same. The T1 cores largely side-step the issue of cache misses by multithreading. Each core is a barrel processor , meaning it switches between available threads each cycle. When
2001-432: A special FPU named FlexFPU, which uses simultaneous multithreading . Each physical integer core, two per module, is single-threaded, in contrast with Intel's Hyperthreading , where two virtual simultaneous threads share the resources of a single physical core. Some floating-point hardware only supports the simplest operations: addition, subtraction, and multiplication. But even the most complex floating-point hardware has
2088-481: A tool for analysing an application's level of parallelism and use of floating point instructions to determine if it is suitable for use on a T1 or T2 platform. In addition to web and application tier processing, the UltraSPARC T1 may be well suited for smaller database applications which have a large user count. One customer has published results showing that a MySQL application running on an UltraSPARC T1 server ran 13.5 times faster than on an AMD Opteron server. T1
2175-410: A whole new architecture for obtaining performance. Rather than try to make each core as intelligent and optimized as they can, Sun's goal was to run as many concurrent threads as possible, and maximize utilization of each core's pipeline. The T1's cores are less complex than those of competing processors in order to allow 8 cores to fit on the same die. The cores do not feature out-of-order execution , or
2262-512: A year later for their mainframe and end-of-support in 2034 "to promote customer modernization". The SPARC architecture was heavily influenced by the earlier RISC designs, including the RISC I and II from the University of California, Berkeley and the IBM 801 . These original RISC designs were minimalist, including as few features or op-codes as possible and aiming to execute instructions at
2349-725: Is 0 in a conditional branch, the delay slot is executed as usual. If it is 1, the delay slot is only executed if the branch is taken. If it is not taken, the instruction following the conditional branch is skipped. There are a wide variety of conditional branches: BA (branch always, essentially a jmp), BN (branch never), BE (equals), BNE (not equals), BL (less than), BLE (less or equal), BLEU (less or equal, unsigned), BG (greater), BGE (greater or equal), BGU (greater unsigned), BPOS (positive), BNEG (negative), BCC (carry clear), BCS (carry set), BVC (overflow clear), BVS (overflow set). The FPU and CP have sets of condition codes separate from
2436-524: Is 3 MB and there is no L3 cache. The T1 processor can be found in the following products from Sun and Fujitsu Computer Systems : The UltraSPARC T1 microprocessor is unique in its strength and weaknesses, and as such is targeted at specific markets. Rather than being used for high-end number-crunching and ultra-high performance applications, the chip is targeted at network-facing high-demand servers, such as high-traffic web servers , and mid-tier Java, ERP, and CRM application servers, which often utilize
2523-713: Is also a BluePrints article on using the Cryptographic Accelerator Units on the T1 and T2 processors. A wide range of applications were optimized on the CoolThreads platform, including Symantec Brightmail AntiSpam, Oracle's Siebel applications, and the Sun Java System Web Proxy Server . Sun also documented its experience in moving its own online store onto a T2000 server cluster, and have published two articles on web consolidation on CoolThreads using Solaris Containers . Sun had an application performance tuning page for
2610-437: Is created by adding the two address operands to produce an address. The second address operand may be a constant or a register. Loads take the value at the address and place it in the register specified by the third operand, whereas stores take the value in the register specified by the first operand and place it at the address. To make this more obvious, the assembler language indicates address operands using square brackets with
2697-593: Is fully open, non-proprietary and royalty-free. As of 2024, the latest commercial high-end SPARC processors are Fujitsu 's SPARC64 XII (introduced in September 2017 for its SPARC M12 server) and Oracle 's SPARC M8 introduced in September 2017 for its high-end servers. On September 1, 2017, after a round of layoffs that started in Oracle Labs in November 2016, Oracle terminated SPARC design after completing
UltraSPARC T1 - Misplaced Pages Continue
2784-521: Is greatly reduced, and the T1 can maintain high throughput with a smaller amount of cache. The cache no longer needs to be large enough to hold all or most of the "working set", just the recent cache misses of each thread. Benchmarks demonstrate this approach has worked very well on commercial (integer), multithreaded workloads such as Java application servers, Enterprise Resource Planning (ERP) application servers, email (such as Lotus Domino ) servers, and web servers. These benchmarks suggest each core in
2871-453: Is some division of floating-point operations from integer operations. This division varies significantly by architecture; some have dedicated floating-point registers, while some, like Intel x86 , go as far as independent clocking schemes. CORDIC routines have been implemented in Intel x87 coprocessors ( 8087 , 80287, 80387 ) up to the 80486 microprocessor series, as well as in
2958-471: Is the first SPARC processor that supports the Hyper-Privileged execution mode. The SPARC Hypervisor runs in this mode, and it can partition a T1 system into 32 Logical Domains , each of which can run an operating system instance. Currently, Solaris , Linux , NetBSD and OpenBSD are supported. Traditionally, commercial software suites such as Oracle Database charge their customers based on
3045-490: The LD instruction, renamed LDUW , clears the upper 32 bits in the register and loads the 32-bit value into the lower 32 bits, and the ST instruction, renamed STW , discards the upper 32 bits of the register and stores only the lower 32 bits. The new LDSW instruction sets the upper bits in the register to the value of the uppermost bit of the word and loads the 32-bit value into
3132-497: The 80287 , and 80386/80386SX -based machines – for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included Cyrix and Weitek . Acorn Computers opted for the WE32206 to offer single , double and extended precision to its ARM powered Archimedes range, introducing
3219-536: The GE-235 featured an "Auxiliary Arithmetic Unit" for floating point and double-precision calculations. Historically, some systems implemented floating point with a coprocessor rather than as an integrated unit (but now in addition to the CPU, e.g. GPUs – that are coprocessors not always built into the CPU ;– have FPUs as a rule, while first generations of GPUs did not). This could be
3306-583: The GNU General Public License via the OpenSPARC project. The published information includes: SPARC SPARC ( Scalable Processor ARChitecture ) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems . Its design was strongly influenced by the experimental Berkeley RISC system developed in the early 1980s. First developed in 1986 and released in 1987, SPARC
3393-511: The IBM System/360 architecture and successors and the x86 architecture. This means that a test and branch is normally performed with two instructions; the first is an ALU instruction that sets the condition codes, followed by a branch instruction that examines one of those flags. The SPARC does not have specialized test instructions; tests are performed using normal ALU instructions with the destination set to %G0. For instance, to test if
3480-495: The Motorola 68881 and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts (and complexity) of the FPU subsystem. Floating-point operations are often pipelined . In earlier superscalar architectures without general out-of-order execution , floating-point operations were sometimes pipelined separately from integer operations. The modular architecture of Bulldozer microarchitecture uses
3567-635: The UltraSPARC T1 implementation: In 2007, Sun released an updated specification, UltraSPARC Architecture 2007 , to which the UltraSPARC T2 implementation complied. In December 2007, Sun also made the UltraSPARC T2 processor's RTL available via the OpenSPARC project. It was also released under the GNU General public license v2. OpenSPARC T2 is 8 cores, 16 pipelines with 64 threads. In August 2012, Oracle Corporation made available
UltraSPARC T1 - Misplaced Pages Continue
3654-401: The 128-bit floating-point registers. Floating-point registers are not windowed; they are all global registers. All SPARC instructions occupy a full 32-bit word and start on a word boundary. Four formats are used, distinguished by the first two bits. All arithmetic and logical instructions have 2 source operands and 1 destination operand. RD is the "destination register", where the output of
3741-486: The 377mm die." The T4 CPU was released in late 2011. The new T4 CPU will drop from 16 cores (on the T3) back to 8 cores (as used on the T1, T2, and T2+). The new T4 core design (named "S3") feature improved per-thread performance, due to introduction of out-of-order execution, as well as having additional improved performance for single-threaded programs. In 2010, Larry Ellison announced that Oracle will offer Oracle Linux on
3828-611: The M8. Much of the processor core development group in Austin, Texas, was dismissed, as were the teams in Santa Clara, California, and Burlington, Massachusetts. Fujitsu will also discontinue their SPARC production (has already shifted to producing their own ARM -based CPUs), after two "enhanced" versions of Fujitsu's older SPARC M12 server in 2020–22 (formerly planned for 2021) and again in 2026–27, end-of-sale in 2029, of UNIX servers and
3915-453: The NZVC condition code bits in the status register , and the other not setting them, with the default being not to set the codes. This is so that the compiler has a way to move instructions around when trying to fill delay slots. If one wants the condition codes to be set, this is indicated by adding cc to the instruction: add and sub also have another modifier, X, which indicates whether
4002-635: The RESTORE instruction (switching back to the call before returning from the procedure). Trap events ( interrupts , exceptions or TRAP instructions) and RETT instructions (returning from traps) also change the CWP . For SPARC V9, CWP register is decremented during a RESTORE instruction, and incremented during a SAVE instruction. This is the opposite of PSR.CWP's behavior in SPARC V8. This change has no effect on nonprivileged instructions. SPARC registers are shown in
4089-585: The SPARC Enterprise T5140 and T5240. In October 2008, Sun released 4-way UltraSPARC T2 Plus SPARC Enterprise T5440 server. In October 2006, Sun disclosed that Niagara 3 will be built with a 45 nm process. The Register , reported in June 2008 that the microprocessor will have 16 cores, incorrectly suggesting each core would have 16 threads. During the Hot Chips 21 conference Sun revealed
4176-451: The T1 is 30 PVUs (each T2 core is 50 PVUs, and T3 is 70 PVUs) instead of the default value of 100 PVUs per core. The T1 only offered a single floating-point unit to be shared by the 8 cores, limiting usage in HPC environments. This weakness was mitigated with the follow-on UltraSPARC T2 processor, which included 8 floating point units, as well as other additional features. Furthermore, the T1
4263-406: The T1, unlike general purpose processor from competing vendors of the time. The processor is available with four, six or eight CPU cores, each core able to handle four threads concurrently. Thus, the processor is capable of processing up to 32 threads concurrently. The UltraSPARC T1 can be partitioned in a similar way to high-end Sun SMP systems. Thus, several cores can be partitioned for running
4350-528: The UltraSPARC IV by Sun and the SPARC64 VI by Fujitsu. In early 2006, Sun released an extended architecture specification, UltraSPARC Architecture 2005 . This includes not only the non-privileged and most of the privileged portions of SPARC V9, but also all the architectural extensions developed through the processor generations of UltraSPARC III, IV, and IV+, as well as CMT extensions starting with
4437-545: The UltraSPARC T1 is more powerful than the circa 2001, single-core, single-threaded UltraSPARC III, and at a chip to chip comparison, significantly outperforms other processors on multithreaded integer workloads. The UltraSPARC T1 contains 279 million transistors and has an area of 378 mm. It was fabricated by Texas Instruments (TI) in their 90 nm complementary metal–oxide–semiconductor (CMOS) process with nine levels of copper interconnect . Each core has L1 16 KB instruction cache and 8 KB data cache. L2 cache
SECTION 50
#17327802095584524-573: The UltraSPARC platform, and the port was scheduled to be available in the T4 and T5 timeframe. John Fowler, Executive Vice President Systems Oracle, in Openworld 2014 said Linux will be able to run on Sparc at some point. The new T5 CPU features 128 threads over 16 cores and is manufactured with a 28 nanometer technology. On March 21, 2006, Sun made the UltraSPARC T1 processor design available under
4611-400: The accuracy can be low, so some systems prefer to compute these functions in software. In general-purpose computer architectures , one or more FPUs may be integrated as execution units within the central processing unit ; however, many embedded processors do not have hardware support for floating-point operations (while they increasingly have them as standard). When a CPU is executing
4698-415: The application instruction ( load–store ) level or at the memory page level (via an MMU setting). The latter is often used for accessing data from inherently little-endian devices, such as those on PCI buses. There have been three major revisions of the architecture. The first published version was the 32-bit SPARC version 7 (V7) in 1986. SPARC version 8 (V8), an enhanced SPARC architecture definition,
4785-451: The bottom two bits of both operands are 0 and reporting overflow if they are not. This can be useful in the implementation of the run time for ML , Lisp , and similar languages that might use a tagged integer format. The endianness of the 32-bit SPARC V8 architecture is purely big-endian. The 64-bit SPARC V9 architecture uses big-endian instructions, but can access data in either big-endian or little-endian byte order, chosen either at
4872-586: The chip has a total of 16 cores and 128 threads. According to the ISSCC 2010 presentation: "A 16-core SPARC SoC processor enables up to 512 threads in a 4-way glueless system to maximize throughput. The 6 MB L2 cache of 461 GB/s and the 308-pin SerDes I/O of 2.4 Tb/s support the required bandwidth. Six clock and four voltage domains, as well as power management and circuit techniques, optimize performance, power, variability and yield trade-offs across
4959-455: The completion of memory references. For example, all effects of the stores that appear prior to the MEMBAR instruction must be made visible to all processors before any loads following the MEMBAR can be executed. Arithmetic and logical instructions also use a three-operand format, with the first two being the operands and the last being the location to store the result. The middle operand can be
5046-522: The condition codes and versions that do. MULSCC and the multiply instructions use the Y register to hold the upper 32 bits of the product; the divide instructions use it to hold the upper 32 bits of the dividend. The RDY instruction reads the value of the Y register into a general-purpose register; the WRY instruction writes the value of a general-purpose register to the Y register. SPARC V9 added MULX , which multiplies two 64-bit values and produces
5133-562: The execution of those instructions. In the 1980s, it was common in IBM PC /compatible microcomputers for the FPU to be entirely separate from the CPU , and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs. The IBM PC, XT , and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286 -based systems were generally socketed for
5220-410: The figure above. There is also a non-windowed Y register, used by the multiply-step, integer multiply, and integer divide instructions. A SPARC V8 processor with an FPU includes 32 32-bit floating-point registers, each of which can hold one single-precision IEEE 754 floating-point number. An even–odd pair of floating-point registers can hold one double-precision IEEE 754 floating-point number, and
5307-552: The first SPARC V7 implementation in 1987 through the Sun UltraSPARC Architecture implementations. Among various implementations of SPARC, Sun's SuperSPARC and UltraSPARC-I were very popular, and were used as reference systems for SPEC CPU95 and CPU2000 benchmarks. The 296 MHz UltraSPARC-II is the reference system for the SPEC CPU2006 benchmark. SPARC is a load–store architecture (also known as
SECTION 60
#17327802095585394-424: The indicated location and then either fill the rest of the target register with zeros (unsigned load) or with the value of the uppermost bit of the byte or half-word (signed load). During a store, those instructions discard the upper bits in the register and store only the lower bits. There are also instructions for loading double-precision values used for floating-point arithmetic , reading or writing eight bytes from
5481-470: The indicated register and the "next" one, so if the destination of a load is L1, L1 and L2 will be set. The complete list of load and store instructions for the general-purpose registers in 32-bit SPARC is LD , ST , LDUB (unsigned byte), LDSB (signed byte), LDUH (unsigned half-word), LDSH (signed half-word), LDD (load double), STB (store byte), STH (store half-word), STD (store double). In SPARC V9, registers are 64-bit, and
5568-459: The integer arithmetic logic unit . The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library . In some cases, FPUs may be specialized, and divided between simpler floating-point operations (mainly addition and multiplication) and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode , while
5655-446: The integer condition codes and from each other; two additional sets of branch instructions were defined to test those condition codes. Adding an F to the front of the branch instruction in the list above performs the test against the FPU's condition codes, while, in SPARC V8, adding a C tests the flags in the otherwise undefined CP. The CALL (jump to subroutine) instruction uses a 30-bit program counter -relative word offset. As
5742-405: The local registers are used for retaining local values across function calls. The "scalable" in SPARC comes from the fact that the SPARC specification allows implementations to scale from embedded processors up through large server processors, all sharing the same core (non-privileged) instruction set. One of the architectural parameters that can scale is the number of implemented register windows;
5829-722: The lower bits. The new LDX instruction loads a 64-bit value into the register, and the STX instruction stores all 64 bits of the register. The LDF , LDDF , and LDQF instructions load a single-precision, double-precision, or quad-precision value from memory into a floating-point register; the STF , STDF , and STQF instructions store a single-precision, double-precision, or quad-precision floating-point register into memory. The memory barrier instruction, MEMBAR, serves two interrelated purposes: it articulates order constraints among memory references and facilitates explicit control over
5916-554: The massive amount of thread-level parallelism (TLP) available on the CoolThreads platform can require different application development techniques than for traditional server platforms. Using TLP in applications is key to getting good performance. Sun has published a number of Sun BluePrints to assist application programmers in developing and deploying software on T1 or T2-based CoolThreads servers. The main article, Tuning Applications on UltraSPARC T1 Chip Multithreading Systems , addresses issues for general application programmers. There
6003-922: The more complex operations are implemented as software. In some current architectures, the FPU functionality is combined with SIMD units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors. Several models of the PDP-11 , such as the PDP-11/45, PDP-11/34a, PDP-11/44, and PDP-11/70, supported an add-on floating-point unit to support floating-point instructions. The PDP-11/60, MicroPDP-11/23 and several VAX models could execute floating-point instructions without an add-on FPU (the MicroPDP-11/23 required an add-on microcode option), and offered add-on accelerators to further speed
6090-415: The number of processors the software runs on. In early 2006, Oracle changed the licensing model by introducing the processor factor . With a processor factor of .25 for the T1, an 8-core T2000 requires only a 2-CPU license. The "Oracle Processor Core Factor Table" has since been updated regularly as new CPUs came to market. In Q3 2006, IBM introduced the concept of Value Unit (VU) pricing. Each core of
6177-443: The operation is deposited. The majority of SPARC instructions have at least this register, so it is placed near the "front" of the instruction format. RS1 and RS2 are the "source registers", which may or may not be present, or replaced by a constant. Load and store instructions have a three-operand format, in that they have two operands representing values for the address and one operand for the register to read or write to. The address
6264-530: The operation should set the carry bit: SPARC V7 does not have multiplication or division instructions, but it does have MULSCC , which does one step of a multiplication testing one bit and conditionally adding the multiplicand to the product. This was because MULSCC can complete over one clock cycle in keeping with the RISC philosophy. SPARC V8 added UMUL (unsigned multiply), SMUL (signed multiply), UDIV (unsigned divide), and SDIV (signed divide) instructions, with both versions that do not update
6351-471: The processor, but older applications burdened with single thread bottlenecks occasionally exhibited poor overall performance. Single-threaded application weakness was mitigated with the follow-on SPARC T4 processor. The T4 core count was reduced to 8 (from 16 on the T3), the cores were made more complex, the clock rate was nearly doubled — all contributing to faster single thread performance (300% to 500% increase over previous generations). Additional effort
6438-584: The register file is not part of the architecture, allowing more registers to be added as the technology improves, up to a maximum of 32 windows in SPARC V7 and V8 as CWP is 5 bits and is part of the PSR register. In SPARC V7 and V8 CWP will usually be decremented by the SAVE instruction (used by the SAVE instruction during the procedure call to open a new stack frame and switch the register window), or incremented by
6525-455: The same object code to run on systems with or without floating-point hardware. Emulation can be implemented on any of several levels: in the CPU as microcode , as an operating system function, or in user-space code. When only integer functionality is available, the CORDIC methods are most commonly used for transcendental function evaluation. In most modern computer architectures, there
6612-562: The specification allows from three to 32 windows to be implemented, so the implementation can choose to implement all 32 to provide maximum call stack efficiency, or to implement only three to reduce cost and complexity of the design, or to implement some number between them. Other architectures that include similar register file features include Intel i960 , IA-64 , and AMD 29000 . The architecture has gone through several revisions. It gained hardware multiply and divide functionality in version 8. 64-bit (addressing and data) were added to
6699-710: The target address is specifying the start of a word, not a byte, 30-bits is all that is needed to reach any address in the 4 gigabyte address space. The CALL instruction deposits the return address in register R15, also known as output register O7. Floating point unit A floating-point unit ( FPU ), numeric processing unit ( NPU ), colloquially math coprocessor , is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition , subtraction , multiplication , division , and square root . Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but
6786-767: The version 9 SPARC specification published in 1994. In SPARC version 8, the floating-point register file has 16 double-precision registers. Each of them can be used as two single-precision registers, providing a total of 32 single-precision registers. An odd–even number pair of double-precision registers can be used as a quad-precision register, thus allowing 8 quad-precision registers. SPARC Version 9 added 16 more double-precision registers (which can also be accessed as 8 quad-precision registers), but these additional registers can not be accessed as single-precision registers. No SPARC CPU implements quad-precision operations in hardware as of 2024. Tagged add and subtract instructions perform adds and subtracts on values checking that
6873-513: Was taped out in October 2006. A two-socket server (2 RU ) will have 128 threads, 16 cores, and a 65× performance improvement over UltraSPARC III. At the Hot Chips 19 conference, Sun announced that Victoria Falls will be in two-way and four-way servers. Thus, a single 4-way SMP server will support 256 concurrent hardware threads. In April 2008, Sun released 2-way UltraSPARC T2 Plus servers,
6960-429: Was certainly influential in the concurrent and future designs of SPARC processors. The original UltraSPARC T1 was designed for single CPU systems only and is not capable of SMP. "Rock" was a more ambitious project, intended to support multiple-chip server architectures, targeting traditional data-facing workloads such as databases. It was seen as more a follow-on to Sun's SMP processors such as UltraSPARC IV , rather than
7047-461: Was made to add the "critical thread API", where the operating system would detect a bottleneck and would temporarily allocate the resources of an entire core, instead of 1 (of 8) threads, to the targeted application processes exhibiting single threaded CPU bound behavior. This allowed the T4 to uniquely mitigate single threaded bottlenecks, while not having to compromise in the overall architecture to achieve massive multi-threaded throughput. Leveraging
7134-426: Was one of the most successful early commercial RISC systems, and its success led to the introduction of similar RISC designs from many vendors through the 1980s and 1990s. The first implementation of the original 32-bit architecture (SPARC V7) was used in Sun's Sun-4 computer workstation and server systems, replacing their earlier Sun-3 systems based on the Motorola 68000 series of processors. SPARC V8 added
7221-414: Was only available in uniprocessor systems, limiting vertical scalability in large enterprise environments. This weakness was mitigated with the follow-on UltraSPARC T2 Plus , as well as the next generation SPARC T3 and SPARC T4 . The UltraSPARC T2+, SPARC T3, and SPARC T4 all offer single, dual, and quad socket configurations. The T1 had outstanding throughput with massive numbers of threads supported by
7308-629: Was released by Fujitsu and Sun, describing processor functions which were identically implemented in the CPUs of both companies ("Commonality"). The first CPUs conforming to JPS1 were the UltraSPARC III by Sun and the SPARC64 V by Fujitsu. Functionalities which are not covered by JPS1 are documented for each processor in "Implementation Supplements". At the end of 2003, JPS2 was released to support multicore CPUs. The first CPUs conforming to JPS2 were
7395-587: Was released by SPARC International in 1993. It was developed by the SPARC Architecture Committee consisting of Amdahl Corporation , Fujitsu , ICL , LSI Logic , Matsushita , Philips , Ross Technology , Sun Microsystems , and Texas Instruments . Newer specifications always remain compliant with the full SPARC V9 Level 1 specification. In 2002, the SPARC Joint Programming Specification 1 (JPS1)
7482-412: Was released in 1990. The main differences between V7 and V8 were the addition of integer multiply and divide instructions, and an upgrade from 80-bit "extended-precision" floating-point arithmetic to 128-bit " quad-precision " arithmetic. SPARC V8 served as the basis for IEEE Standard 1754-1994, an IEEE standard for a 32-bit microprocessor architecture. SPARC version 9 , the 64-bit SPARC architecture,
7569-694: Was turned over to the SPARC International trade group in 1989, and since then its architecture has been developed by its members. SPARC International is also responsible for licensing and promoting the SPARC architecture, managing SPARC trademarks (including SPARC, which it owns), and providing conformance testing . SPARC International was intended to grow the SPARC architecture to create a larger ecosystem; SPARC has been licensed to several manufacturers, including Atmel , Bipolar Integrated Technology , Cypress Semiconductor , Fujitsu , Matsushita and Texas Instruments . Due to SPARC International, SPARC
#557442