The UltraSPARC is a microprocessor developed by Sun Microsystems and fabricated by Texas Instruments , introduced in mid-1995. It is the first microprocessor from Sun to implement the 64-bit SPARC V9 instruction set architecture (ISA). Marc Tremblay was a co-microarchitect.
61-543: The UltraSPARC is a four-issue superscalar microprocessor that executes instructions in in-order . It includes a nine-stage integer pipeline . The execution units were simplified relative to the SuperSPARC to achieve higher clock frequencies - an example of a simplification is that the ALUs were not cascaded, unlike the SuperSPARC, to avoid restricting clock frequency. The integer register file has 32 64-bit entries. As
122-545: A capacity of 512 KB to 4 MB and is direct-mapped. It can return data in a single cycle. The external cache is implemented with synchronous SRAMs clocked at the same frequency as the microprocessor, as ratios were not supported. It is accessed via the data bus. It contained 3.8 million transistors. It was fabricated in Texas Instruments' EPIC-3 process, a 0.5 μm complementary metal–oxide–semiconductor (CMOS) process with four levels of metal. The UltraSPARC
183-417: A clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate . Each execution unit is not a separate processor (or a core if the processor is a multi-core processor ), but an execution resource within
244-439: A customer reported that localized power outages had shut down their computer, but left the cooling system running — so they arrived in the morning to find the machine encased in ice. Cray addressed the problem of skew by ensuring that every signal path in his later computers was the same electrical length, so that values that were to be acted upon at a particular time were indeed all valid values. When required, he would run
305-462: A field, which would you rather use: two strong oxen or 1024 chickens?" By the mid-1990s, this argument was becoming increasingly difficult to justify, and modern compiler technology made developing programs on such machines not much more difficult than their simpler counterparts. Cray set up a new company, SRC Computers , and started the design of his own massively parallel machine. The new design concentrated on communications and memory performance,
366-512: A new laboratory on land Cray owned in his hometown of Chippewa Falls. Part of the reason for the move may also have to do with Cray's worries about an impending nuclear war , which he felt made the Twin Cities a serious safety concern. His house, built a few hundred yards from the new CDC laboratory, included a huge bomb shelter . The new Chippewa Lab was set up during the middle of the 6600 project, although it does not seem to have delayed
427-488: A number of unusual tales about his life away from work, termed "Rollwagenisms", from then-CEO of Cray Research, John A. Rollwagen. Cray enjoyed skiing , windsurfing , tennis , and other sports. Another favorite pastime was digging a tunnel under his home; he attributed the secret of his success to "visits by elves " while he worked in the tunnel: "While I'm digging in the tunnel, the elves will often come to me with solutions to my problem." One story has it that when Cray
488-418: A single CPU such as an arithmetic logic unit . While a superscalar CPU is typically also pipelined , superscalar and pipelining execution are considered different performance enhancement techniques. The former (superscalar) executes multiple instructions in parallel by using multiple execution units, whereas the latter (pipeline) executes multiple instructions in the same execution unit in parallel by dividing
549-423: A single processor. Thus a multicore CPU is possible where each core is an independent processor containing multiple parallel pipelines, each pipeline being superscalar. Some processors also include vector capability. Seymour Cray Seymour Roger Cray (September 28, 1925 – October 5, 1996 ) was an American electrical engineer and supercomputer architect who designed a series of computers that were
610-436: A superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching each to one of the several execution units contained inside a single CPU. Therefore, a superscalar processor can be envisioned as having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread. Most modern superscalar CPUs also have logic to reorder
671-426: Is no assurance otherwise and failure to detect a dependency would produce incorrect results. No matter how advanced the semiconductor process or how fast the switching speed, this places a practical limit on how many instructions can be simultaneously dispatched. While process advances will allow ever greater numbers of execution units (e.g. ALUs), the burden of checking instruction dependencies grows rapidly, as does
SECTION 10
#1732783943478732-399: Is often misattributed to Herb Grosch as so-called Grosch's law : Computers should obey a square law — when the price doubles, you should get at least four times as much speed. During this period Cray had become increasingly annoyed at what he saw as interference from CDC management. Cray always demanded an absolutely quiet work environment with a minimum of management overhead, but as
793-435: Is packaged in a 521-contact plastic ball grid array (PBGA). Superscalar A superscalar processor (or multiple-issue processor ) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor , which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during
854-454: Is removed and delegated to the compiler . Explicitly parallel instruction computing (EPIC) is like VLIW with extra cache prefetching instructions. Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar processors. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures. The fact that they are independent means that we know that
915-480: Is the difference between scalar and vector arithmetic. A superscalar processor is a mixture of the two. Each instruction processes one data item, but there are multiple execution units within each CPU thus multiple instructions can be processing separate data items concurrently. Superscalar CPU design emphasizes improving the instruction dispatcher accuracy and allowing it to keep the multiple execution units in use at all times. This has become increasingly important as
976-644: The ALU , integer multiplier , integer shifter, FPU , etc. There may be multiple versions of each execution unit to enable the execution of many instructions in parallel. This differs from a multi-core processor that concurrently processes instructions from multiple threads, one thread per processing unit (called "core"). It also differs from a pipelined processor , where the multiple instructions can concurrently be in various stages of execution, assembly-line fashion. The various alternative techniques are not mutually exclusive—they can be (and frequently are) combined in
1037-591: The United States he earned a B.Sc. in electrical engineering at the University of Minnesota , graduating in 1949, followed by a M.Sc. in applied mathematics in 1951. In 1950, Cray joined Engineering Research Associates (ERA) in Saint Paul, Minnesota . ERA had formed out of a former United States Navy laboratory that had built codebreaking machines, a tradition ERA carried on when such work
1098-481: The Visual Instruction Set (VIS). The floating-point register file contains thirty-two 64-bit registers. It has five read ports and three write ports. The UltraSPARC has two levels of cache, primary and secondary. There are two primary caches, one for instructions and one for data. Both have a capacity of 16 KB. The UltraSPARC required a mandatory external secondary cache. The cache is unified, has
1159-559: The CDC 1604 was starting to ship to customers in 1960, Cray had already moved on to designing other computers. He first worked on the design of an upgraded version (the CDC 3000 series ), but company management wanted these machines targeted toward "business and commercial" data processing for average customers. Cray did not enjoy working on such "mundane" machines, constrained to design for low-cost construction, so CDC could sell many of them. His desire
1220-642: The Cray-3 project from Chippewa Falls to a laboratory in Colorado Springs, Colorado . In 1989, Cray was faced with a repeat of history when the Cray-3 started to run into difficulties. An upgrade of the X-MP using high-speed memory from the Cray-2 was under development and seemed to be making real progress, and once again management was faced with two projects and limited budgets. They eventually decided to take
1281-493: The Cray-3, and the ending of the Cold War made it unlikely anyone would buy enough Cray-4s to offer a return on the development funds. The company ran out of money and filed for Chapter 11 bankruptcy 24 March 1995. Cray had always resisted the massively parallel solution to high-speed computing, offering a variety of reasons that it would never work as well as one very fast processor. He famously quipped "If you were plowing
SECTION 20
#17327839434781342-728: The SPARC ISA uses register windows , of which the UltraSPARC has eight, the actual number of registers is 144. The register file has seven read and three write ports. The integer register file provides registers to two arithmetic logic units and the load/store unit. The two ALUs can both execute arithmetic, logic and shift instructions but only one can execute multiply and divide instructions. The floating-point unit consists of five functional units. One executes floating point adds and subtracts, one multiplies, one divides and square-roots. Two units are for executing SIMD instructions defined by
1403-577: The X-MP, largely due to very fast and large main memory, and thus it sold in much smaller numbers. The Cray-2 ran at 250 MHz with a very deep pipeline , making it harder to write code than for the shorter-pipe X-MP. As the Cray-3 project started, he found himself once again being "bothered" too much with day-to-day tasks. In order to concentrate on design, Cray left the CEO position of Cray Research in 1980 to become an independent contractor. In 1988, he moved
1464-588: The age of ten he was able to build a device out of Erector Set components that converted punched paper tape into Morse code signals. The basement of the family home was given over to the young Cray as a "laboratory". Cray graduated from Chippewa Falls High School in 1943 before being drafted for World War II as a radio operator. He saw action in Europe , and then moved to the Pacific theatre where he worked on breaking Japanese naval codes . On his return to
1525-516: The bottleneck that hampered many parallel designs. Design had just started when Cray was killed in a car accident. SRC Computers carried on development and specialized in reconfigurable computing . Cray frequently cited two important aspects to his design philosophy: remove heat, and ensure that all signals that are supposed to arrive somewhere at the same time do indeed arrive at the same time. His computers were equipped with built-in cooling systems, extending ultimately to coolant channels cast into
1586-584: The company grew he found himself constantly interrupted by middle managers who – according to Cray – did little but gawk and use him as a sales tool by introducing him to prospective customers. Cray decided that in order to continue development he would have to move from St. Paul, far enough that it would be too long a drive for a "quick visit" and long-distance telephone charges would be just enough to deter most calls, yet close enough that real visits or board meetings could be attended without too much difficulty. After some debate, Norris backed him and set up
1647-511: The company while they were being designed. The 8600 was running into similar difficulties and Cray eventually decided that the only solution was to start over fresh. This time Norris was not willing to take the risk, and another project within the company, the CDC STAR-100 , seemed to be progressing more smoothly. Norris said he was willing to keep the project alive at a low level until the STAR
1708-402: The complexity of register renaming circuitry to mitigate some dependencies. Collectively the power consumption , complexity and gate delay costs limit the achievable superscalar speedup. However even given infinitely fast dependency checking logic on an otherwise conventional superscalar CPU, if the instruction stream itself has many dependencies, this would also limit the possible speedup. Thus
1769-419: The degree of intrinsic parallelism in the code stream forms a second limitation. Collectively, these limits drive investigation into alternative architectural changes such as very long instruction word (VLIW), explicitly parallel instruction computing (EPIC), simultaneous multithreading (SMT), and multi-core computing . With VLIW, the burdensome task of dependency checking by hardware logic at run time
1830-434: The early 1950s. At the newly formed Sperry Rand , ERA became the scientific computing arm of their UNIVAC division. Cray, along with William Norris , later became dissatisfied with ERA, then spun off as Sperry Rand. In 1957, they founded a new company, Control Data Corporation . By 1960 he had completed the design of the CDC 1604 , an improved low-cost ERA 1103 that had impressive performance for its price. Even as
1891-413: The execution unit into different phases. In the "Simple superscalar pipeline" figure, fetching two instructions at the same time is superscaling, and fetching the next two before the first pair has been written back is pipelining. The superscalar technique is traditionally associated with several identifying characteristics (within a given CPU): Seymour Cray 's CDC 6600 from 1964 is often mentioned as
UltraSPARC - Misplaced Pages Continue
1952-635: The farthest edge of credibility when Seymour envisioned them." Larry Smarr , then director of the National Center for Supercomputing Applications at the University of Illinois said that Cray is "the Thomas Edison of the supercomputing industry." Cray was born in 1925 in Chippewa Falls, Wisconsin , to Seymour R. and Lillian Cray. His father was a civil engineer who fostered Cray's interest in science and engineering. As early as
2013-441: The fastest in the world for decades, and founded Cray Research , which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry. Joel S. Birnbaum , then chief technology officer of Hewlett-Packard , said of him: "It seems impossible to exaggerate the effect he had on the industry; many of the things that high performance computers now do routinely were at
2074-441: The first commercial supercomputer, outperforming everything then available by a wide margin. While expensive, for those that needed the fastest computer available there was nothing else on the market that could compete. When other companies (namely IBM ) attempted to create machines with similar performance, they stumbled ( IBM 7030 Stretch ). In the 6600, Cray had solved the critical design problem of "imprecise interrupts", which
2135-593: The first superscalar design. The 1967 IBM System/360 Model 91 was another superscalar mainframe. The Intel i960 CA (1989), the AMD 29000 -series 29050 (1990), and the Motorola MC88110 (1991), microprocessors were the first commercial single-chip superscalar microprocessors. RISC microprocessors like these were the first to have superscalar execution, because RISC architectures free transistors and die area which can be used to include multiple execution units and
2196-448: The instruction of one thread can be executed out of order and/or in parallel with the instruction of a different one. Also, one independent thread will not produce a pipeline bubble in the code stream of a different one, for example, due to a branch. Superscalar processors differ from multi-core processors in that the several execution units are not entire processors. A single processor is composed of finer-grained execution units such as
2257-452: The instructions to try to avoid pipeline stalls and increase parallel execution. Available performance improvement from superscalar techniques is limited by three key areas: Existing binary executable programs have varying degrees of intrinsic parallelism. In some cases instructions are not dependent on each other and can be executed simultaneously. In other cases they are inter-dependent: one instruction impacts either resources or results of
2318-399: The machine in an attempt to enable it to run as fast as possible. Unlike most high-end projects, Cray realized that there was considerably more to performance than simple processor speed, that I/O bandwidth had to be maximized as well in order to avoid "starving" the processor of data to crunch. He later noted, "Anyone can build a fast CPU. The trick is to build a fast system." The 6600 was
2379-468: The machine would have to be built using gallium arsenide semiconductors. In the past Cray had always avoided using anything even near the state of the art , preferring to use well-known solutions and designing a fast machine based on them. In this case, Cray was developing every part of the machine, even the chips inside it. Nevertheless, the team were able to get the machine working and delivered their first example to NCAR on 24 May 1993. The machine
2440-408: The mainframes and thermally coupled to metal plates within the circuit boards, and to systems immersed in coolants. In a story he told about himself, he realized early in his career that he should interlock the computers with the cooling systems so that the computers would not operate unless the cooling systems were operational. It did not originally occur to him to interlock in the other direction until
2501-463: The more rigid methods used in the simpler P5 Pentium ; it also simplified speculative execution and allowed higher clock frequencies compared to designs such as the advanced Cyrix 6x86 . The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. An analogy
UltraSPARC - Misplaced Pages Continue
2562-753: The number of units has increased. While early superscalar CPUs would have two ALUs and a single FPU , a later design such as the PowerPC 970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system will be no better than that of a simpler, cheaper design. A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycle . But merely processing multiple instructions concurrently does not make an architecture superscalar, since pipelined , multiprocessor or multi-core architectures also achieve that, but with different methods. In
2623-468: The other. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; b = e + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units. Although the instruction stream may contain no inter-instruction dependencies, a superscalar CPU must nonetheless check for that possibility, since there
2684-597: The processor. When it was released it easily beat almost every machine in terms of speed, including the STAR-100 that had beaten the 8600 for funding. The only machine able to perform on the same sort of level was the ILLIAC IV , a specialized one-off machine that rarely operated near its maximum performance, except on very specific tasks. In general, the Cray-1 beat anything on the market by a wide margin. Serial number 001
2745-469: The project. After the 6600 shipped, the successor CDC 7600 system was the next product to be developed in Chippewa Falls, offering peak computational speeds of ten times the 6600. The failed follow-on to the 7600, the CDC 8600 , was the project that finally ended his run of successes at CDC in 1972. Although the 6600 and 7600 had been huge successes in the end, both projects had almost bankrupted
2806-522: The safer route, releasing the new design as the Cray Y-MP . Cray decided to spin off the Colorado Springs laboratory to form Cray Computer Corporation . This new entity took the Cray-3 project with them. The 500 MHz Cray-3 proved to be Cray's second major failure. In order to provide the tenfold increase in performance that he always demanded of his newest machines, Cray decided that
2867-470: The traces back and forth on the circuit boards until the desired length was achieved, and he employed Maxwell's equations in design of the boards to ensure that any radio frequency effects which altered the signal velocity and hence the electrical path length were accounted for. When asked what kind of CAD tools he used to design computers, Cray said that he liked pads of 8 1 ⁄ 2 ″ × 11″ "faintly-ruled 1 ⁄ 4 -inch quadrille " paper. Cray
2928-477: The traditional uniformity of the instruction set favors superscalar dispatch (this was why RISC designs were faster than CISC designs through the 1980s and into the 1990s, and it's far more complicated to do multiple dispatch when instructions have variable bit length). Except for CPUs used in low-power applications, embedded systems , and battery -powered devices, essentially all general-purpose CPUs developed since about 1998 are superscalar. The P5 Pentium
2989-408: Was "lent" to Los Alamos National Laboratory in 1976, and that summer the first full system was sold to the National Center for Atmospheric Research (NCAR) for $ 8.8 million. The company's early estimates had suggested that they might sell a dozen such machines, based on sales of similar machines from the CDC era, so the price was set accordingly. Eventually, well over 80 Cray-1s were sold, the company
3050-473: Was a huge success financially, and Cray's innovations with super computers won him the nickname "The Wizard of Chippewa Falls". Follow-up success was not as easy. While he worked on the Cray-2 , other teams delivered the two-processor Cray X-MP , which was another huge success and later the four-processor X-MP. When the Cray-2 was finally released after six years of development it was only marginally faster than
3111-468: Was also in Chippewa Falls. At first there was some question as to what exactly the new company should do. It did not seem that there would be any way for them to afford to develop a new computer, given that the now-large CDC had been unable to support more than one. When the President in charge of financing traveled to Wall Street to look for seed money , he was surprised to find that Cray's reputation
SECTION 50
#17327839434783172-405: Was asked by management to provide detailed one-year and five-year plans for his next machine, he simply wrote, "Five-year goal: Build the biggest computer in the world. One year goal: One-fifth of the above." And another time, when expected to write a multi-page detailed status report for the company executives, Cray's two-sentence report read: "Activity is progressing satisfactorily as outlined under
3233-489: Was available. ERA was introduced to computer technology during one such effort, but in other times had worked on a wide variety of basic engineering as well. Cray quickly came to be regarded as an expert on digital computer technology, especially following his design work on the ERA 1103 , the first commercially successful scientific computer. He remained at ERA when it was bought by Remington Rand and then Sperry Corporation in
3294-525: Was delivered, at which point full funding could be put into the 8600. Cray was unwilling to work under these conditions and left the company. The split was fairly amicable, and when he started Cray Research in a new laboratory on the same Chippewa property a year later, Norris invested $ 250,000 in start-up money. Like CDC's organization, Cray R&D was based in Chippewa Falls and business headquarters were in Minneapolis. Unlike CDC, Cray's manufacturing
3355-498: Was involved in the design of the following computers: Cray married Verene Voll in 1947. They had known each other since childhood. She was the daughter of a Methodist minister, as was Cray's mother, and Verene worked as a nutritionist. They had three children. Cray and Voll divorced around 1978. He later married Geri M. Harrand. Cray was the grandfather of the LGBTQ rights activist Andrew Cray . Cray avoided publicity. There are
3416-447: Was largely responsible for IBM's failure. He did this by replacing I/O interrupts with a polled request issued by one of ten so-called peripheral processors, which were built-in mini-computers that did all transfers in and out of the 6600's central memory. The following CDC 7600 even improved the speed advantage by a factor of five. In 1963, in a Business Week article announcing the CDC 6600, Seymour Cray clearly expressed an idea that
3477-465: Was not fabricated in a BiCMOS process as Texas Instruments claimed it did not scale well to 0.5 μm processes and offered little performance improvement. The process was perfected on TI's MVP digital signal processor (DSP) with some features missing such as three levels of metal instead of four and a 0.55 feature size, before it was used to fabricate the UltraSPARC to avoid a repeat of the fabrication problems encountered with SuperSPARC . The UltraSPARC
3538-486: Was still essentially a prototype, and the company was using the installation to debug the design. By this time a number of massively parallel machines were coming into the market at price/performance ratios the Cray-3 could not touch. Cray responded through "brute force", starting design of the Cray-4 , which would run at 1 GHz and outpower these machines, regardless of price. In 1995 there had been no further sales of
3599-459: Was the first superscalar x86 processor; the Nx586 , P6 Pentium Pro and AMD K5 were among the first designs which decode x86 -instructions asynchronously into dynamic microcode -like micro-op sequences prior to actual execution on a superscalar microarchitecture ; this opened up for dynamic scheduling of buffered partial instructions and enabled more parallelism to be extracted compared to
3660-404: Was to "produce the largest [fastest] computer in the world". So after some basic design work on the CDC 3000 series, he turned that over to others and went on to work on the CDC 6600 . Nonetheless, several special features of the 6600 first started to appear in the 3000 series. Although in terms of hardware the 6600 was not on the leading edge, Cray invested considerable effort into the design of
3721-451: Was very well known. Far from struggling for some role to play in the market, the financial world was more than willing to provide Cray with all the money they would need to develop a new machine. After several years of development, their first product was released in 1976 as the Cray-1 . As with earlier Cray designs, the Cray-1 made sure that the entire computer was fast, as opposed to just
SECTION 60
#1732783943478#477522