The VP2000 was the second series of vector supercomputers from Fujitsu . Announced in December 1988, they replaced Fujitsu's earlier FACOM VP Model E Series . The VP2000 was succeeded in 1995 by the VPP300, a massively parallel supercomputer with up to 256 vector processors.
81-454: The VP2000 was similar in many ways to their earlier designs, and in turn to the Cray-1 , using a register -based vector processor for performance. For additional performance the vector units supported a special multiply-and-add instruction that could retire two results per clock cycle. This instruction "chain" is particularly common in many supercomputer applications. Another difference is that
162-412: A 15-bit instruction word containing a 6-bit operation code. There are only 64 machine codes, including a no-operation code , with no fixed-point multiply or divide operations in the central processor. The 7600 has two main core memories. Small core memory holds the instructions currently being executed and the data currently being processed. It has an access time of 10 of the 27.5-ns minor cycles and
243-408: A 60-bit word length. Large core memory holds data ready to transfer to small core memory. It has an access time of 60 of the 27.5-ns minor cycles and a word length of 480 bits (512 bits with parity). Accesses are fully pipelined and buffered, so the two have the same sequential transfer rate of 60 bits every 27.5 ns. The two work in parallel, so the sequential transfer rate from one to the other
324-460: A celebrity and his company a success, lasting until the supercomputer crash in the early 1990s. Based on a recommendation by William Perry 's study, the NSA purchased a Cray-1 for theoretical research in cryptanalysis . According to Budiansky, "Though standard histories of Cray Research would persist for decades in stating that the company's first customer was Los Alamos National Laboratory, in fact it
405-423: A lineup of investors willing to back Cray, all that was needed was a design. For four years Cray Research designed its first computer. In 1975 the 80 MHz Cray-1 was announced. The excitement was so high that a bidding war for the first machine broke out between Lawrence Livermore National Laboratory and Los Alamos National Laboratory , the latter eventually winning and receiving serial number 001 in 1976 for
486-544: A logical unit, a population count , a leading zero count unit and a shift unit. The vector portion consisted of add, logical and shift units. The floating point functional units were shared between the scalar and vector portions, and these consisted of add, multiply and reciprocal approximation units. The system had limited parallelism. It could issue one instruction per clock cycle, for a theoretical performance of 80 MIPS , but with vector floating-point multiplication and addition occurring in parallel theoretical performance
567-519: A machine that ran several times faster than any similar design. The Cray-1's architect was Seymour Cray ; the chief engineer was Cray Research co-founder Lester Davis. They would go on to design several new machines using the same basic concepts, and retained the performance crown into the 1990s. From 1968 to 1972, Seymour Cray of Control Data Corporation (CDC) worked on the CDC 8600 , the successor to his earlier CDC 6600 and CDC 7600 designs. The 8600
648-443: A set of sixty-four registers each for S and A temporary storage known as T and B respectively, which could not be seen by the functional units. The vector system added another eight 64-element by 64-bit vector (V) registers, as well as a vector length (VL) and vector mask (VM). Finally, the system also included a 64-bit real-time clock register and four 64-bit instruction buffers that held sixty-four 16-bit instructions each. The hardware
729-475: A six-month trial. The National Center for Atmospheric Research (NCAR) was the first official customer of Cray Research in 1977, paying US$ 8.86 million ($ 7.9 million plus $ 1 million for the disks) for serial number 3. The NCAR machine was decommissioned in 1989. The company expected to sell perhaps a dozen of the machines, and set the selling price accordingly, but ultimately over 80 Cray-1s of all types were sold, priced from $ 5M to $ 8M. The machine made Seymour Cray
810-536: A small core memory read or write. Arithmetic and logic instructions have these registers as sources and destinations. The programmer or compiler tries to fetch data in time to be used and store data before more data needs the same register, but if it is not ready, the processor goes into a wait state until it is. It also waits if one of the four floating-point arithmetic units is not ready when requested, but due to pipelining, this does not usually happen. The CDC 7600 "was designed to be machine code upward compatible with
891-551: A small set of data into the vector registers and then running several operations on it, the vector system of the new design had its own separate pipeline. For instance, the multiplication and addition units were implemented as separate hardware, so the results of one could be internally pipelined into the next, the instruction decode having already been handled in the machine's main pipeline. Cray referred to this concept as chaining , as it allowed programmers to "chain together" several instructions and extract higher performance. In 1978,
SECTION 10
#1732793276969972-785: A team from the Argonne National Laboratory tested a variety of typical workloads on a Cray-1 as part of a proposal to purchase one for their use, replacing their IBM 370/195 . They also planned on testing on the CDC STAR-100 and Burroughs Scientific Computer , but such tests, if they were performed, were not published. The tests were run on the Cray-1 at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado . The only other Cray available at
1053-430: A total of 72 bits per word. Memory was spread across 16 interleaved memory banks, each with a 50 ns cycle time, allowing up to four words to be read per cycle. Smaller configurations could have 0.25 or 0.5 megawords of main memory. Maximum aggregate memory bandwidth was 638 Mbit/s. The main register set consisted of eight 64-bit scalar (S) registers and eight 24-bit address (A) registers. These were backed by
1134-462: A very compact, but basically unrepairable module. However the same dense packing also led to the machine's biggest problem – heat. For the 7600, Cray once again turned to his refrigeration engineer, Dean Roush, formerly of the Amana company. Roush added an aluminum plate to the back of each side of the cordwood stack, which were in turn cooled by a liquid-freon system running through
1215-469: Is 60 bits per 27.5 ns minor-cycle. On an operating system call, the contents of the small core memory are swapped out and replaced from the large core memory by the operating system, and restored afterward. There is a 12-word instruction pipeline, called instruction word stack in CDC documentation. All addresses in the stack are fetched, without waiting for the instruction field to be processed. Therefore,
1296-519: Is a physical limit to performance because of the time it takes signals to move between parts of the machine, which in turn is defined by its physical size. As always, Cray's design work spent considerable effort on this problem and thus allow higher operating frequencies. For the 7600, each circuit module actually consisted of up to six printed circuit boards , each one stuffed with subminiature resistors, diodes, and transistors. The six boards were stacked up and then interconnected along their edges, making
1377-476: The CPU of the computer is built up from a number of separate parts dedicated to a single task, for instance, adding a number, or fetching from memory. Normally, as the instruction flows through the machine, only one part is active at any given time. This means that each sequential step of the entire process must complete before a result can be saved. The addition of an instruction pipeline changes this. In such machines
1458-1019: The National Science Foundation supercomputer centers (for high-energy physics) represented the second largest block with LLL's Cray Time Sharing System (CTSS). CTSS was written in a dynamic memory Fortran, first named LRLTRAN, which ran on CDC 7600s , renamed CVC (pronounced "Civic") when vectorization for the Cray-1 was added. Cray Research attempted to support these sites accordingly. These software choices had influences on later minisupercomputers , also known as " crayettes ". NCAR has its own operating system (NCAROS). The National Security Agency developed its own operating system (Folklore) and language (IMP with ports of Cray Pascal and C and Fortran 90 later) Libraries started with Cray Research's own offerings and Netlib . Other operating systems existed, but most languages tended to be Fortran or Fortran-based. Bell Laboratories , as proof of both portability concept and circuit design, moved
1539-716: The Unix -compatible UXP/M or the MVS -compatible VSP/S operating systems , both supplied by Amdahl . The later was used for Fortran programs while the former was typically used for C , and vectorizing compilers were supplied for both languages. Like most companies, Fujitsu turned to massive parallelism for future machines, and the VP2000 family were not on the market for very long. Nevertheless, over 100 were sold, and in July 1993, there were 180 installed. Cray-1 The Cray-1
1620-447: The supercomputer field into the 1970s. The 7600 ran at 36.4 MHz (27.5 ns clock cycle) and had a 65 Kword primary memory (with a 60-bit word size) using magnetic core and variable-size (up to 512 Kword) secondary memory (depending on site). It was generally about ten times as fast as the CDC 6600 and could deliver about 10 MFLOPS on hand-compiled code, with a peak of 36 MFLOPS. In addition, in benchmark tests in early 1970 it
1701-435: The 1960s, it was only in the early 1970s that they reached the performance necessary for high-speed applications. The Cray-1 used only four different IC types, an ECL dual 5-4 NOR gate (one 5-input, and one 4-input, each with differential output), another slower MECL 10K 5-4 NOR gate used for address fanout , a 16×4-bit high speed (6 ns) static RAM (SRAM) used for registers and a 1,024×1-bit 48 ns SRAM used for
SECTION 20
#17327932769691782-746: The 60-bit words, but a 30-bit instruction could not straddle two words, and control could only be transferred to the first instruction in a word. However, the instruction set itself had changed to reflect the new internal memory layout, thereby rendering it incompatible with the earlier 6600. The machines were similar enough to make porting of compilers and operating systems possible without too much trouble. The machine initially did not come with software; sites had to be willing to write their own operating system, like LTSS, NCAROS, and others; and compilers like LRLTRAN (Livermore's version of Fortran with dynamic memory management and other non-standard features). CDC also manufactured two multi-processor computers based on
1863-459: The 6600 and 7600 left mundane housekeeping tasks, printing output or reading punched cards , for instance, to a series of ten smaller 12-bit machines based on the CDC 160-A known as "Peripheral Processor Units", or PPUs. For any given cycle of the machine one of the PPUs was in control, feeding data into the memory while the main processor was crunching numbers. When the cycle completed, the next PPU
1944-403: The 6600, but to provide a substantial increase in performance". One user said: "Most users could run on either system without changes." Although the 7600 shared many features of the 6600, including hardware, instructions, and its 60-bit word size, it was not object-code compatible with the CDC 6600. In addition, it was not entirely source-code ( COMPASS ) compatible, as some instructions in
2025-476: The 6600. At the time computer memory could be arranged in blocks with independent access paths, and Cray's designs used this to their advantage. While most machines would use a single CPU to run all the functionality of the system, Cray realized that this meant each memory block spent a considerable amount of time idle while the CPU was processing instructions and accessing other blocks. In order to take advantage of this,
2106-399: The 7600 did not exist in the 6600, and vice versa. It had originally been named the CDC 6800, but was changed to 7600 when Cray decided that it could not be completely compatible. However, due to the 7600's operating system design, the 6600 and 7600 shared a "uniform software environment" despite the low-level differences. In fact, from a high-level perspective, the 7600 was quite similar to
2187-407: The 7600, but added hardware and instructions to speed up particularly common supercomputer tasks. By 1972, the 8600 had reached a dead end; the machine was so incredibly complex that it was impossible to get one working properly. Even a single faulty component would render the machine non-operational. Cray went to William Norris , Control Data's CEO, saying that a redesign from scratch was needed. At
2268-665: The 7600, with the model number 7700. They consisted of two 7600 machines in an asymmetric configuration: a central and an adjunct machine. They were used for missile launch and inbound tracking of USSR ICBMs . The radar simulator was a real-time simulator with a CDC 6400 for input/output front-end. These systems were to be used in the Pacific Missile Range . One computer was installed at TRW in Redondo Beach CA (later moved to Kwajalein Atoll, South Pacific), and
2349-518: The 8 ns 8600 he had given up on, but fast enough to beat CDC 7600 and the STAR. NCAR estimated that the overall throughput on the system was 4.5 times that of the CDC 7600. The Cray-1 was built as a 64-bit system, a departure from the 7600/6600, which were 60-bit machines (a change was also planned for the 8600). Addressing was 24-bit, with a maximum of 1,048,576 64-bit words (1 megaword) of main memory, where each word also had eight parity bits for
2430-735: The CPU during use, and optionally as a front-end computer. Most, if not all, Cray-1As were delivered using the follow-on Data General Eclipse as the MCU. The reliability of the CRAY-1A was very low by today's standards. At the European Centre for Medium-Range Weather Forecasts , which was one of the first customers, the mean time between hardware faults was reported to be 96 hours in 1979. Seymour Cray deliberately made design decisions that sacrificed reliability for speed, but improved his later designs after being questioned on this matter. Similarly,
2511-436: The CPU will "look ahead" and begin fetching succeeding instructions while the current instruction is still being processed. In this assembly line fashion any one instruction still requires as long to complete, but as soon as it finishes executing, the next instruction is right behind it, with most of the steps required for its execution already completed. Vector processors use this technique with one additional trick. Because
Fujitsu VP2000 - Misplaced Pages Continue
2592-569: The Cray Operating System (COS) was fairly rudimentary, hardly tested and updated weekly or even daily in the early days. The Cray-1S , announced in 1979, was an improved Cray-1 that supported a larger main memory of 1, 2 or 4 million words. The larger main memory was made possible through the use of 4,096 x 1-bit bipolar RAM ICs with a 25 ns access time. The Data General minicomputers were optionally replaced with an in-house 16-bit design running at 80 MIPS. The I/O subsystem
2673-573: The Cray-1 and X-MP models was therefore made by the name Cray Y-MP and launched in 1988. By comparison, the processor in a typical 2013 smart device, such as a Google Nexus 10 or HTC One , performs at roughly 1 GFLOPS, while the A13 processor in a 2019 iPhone 11 performs at 154.9 GFLOPS, a mark supercomputers succeeding the Cray-1 would not reach until 1994 . Typical scientific workloads consist of reading in large data sets, transforming them in some way and then writing them back out again. Normally
2754-523: The Freon refrigeration system. Configured with 1 million words of main memory, the machine and its power supplies consumed about 115 kW of power; cooling and storage likely more than doubled this figure. A Data General SuperNova S/200 minicomputer served as the maintenance control unit (MCU), which was used to feed the Cray Operating System into the system at boot time, to monitor
2835-653: The S/4400 with four I/O processors and 4 million words of memory. The Cray-1M , announced in 1982, replaced the Cray-1S. It had a faster 12 ns cycle time and used less expensive MOS RAM in the main memory. The 1M was supplied in only three versions, the M/1200 with 1 million words in 8 banks, or the M/2200 and M/4200 with 2 or 4 million words in 16 banks. All of these machines included two, three or four I/O processors, and
2916-405: The STAR, the Cray-1 would have to read only a portion of the vector at a time, but it could then run several operations on that data prior to writing the results back to memory. Given typical workloads, Cray felt that the small cost incurred by being required to break large sequential memory accesses into segments was a cost well worth paying. Since the typical vector operation would involve loading
2997-485: The additional load/store units, adding additional scalar units improved performance by increasing memory bandwidth, as well as allowing several programs to run at the same time and thereby increase the chance there was something to process on the vector unit. Each unit is said to increase performance 1.5 times, allowing the VP2400/40 to match the performance of the earlier VP2600/20. The machines were supplied with either
3078-417: The core of the machine. Since this system was mechanical, and therefore prone to failure, the 7600 was redesigned into a large "C" shape to allow access to the modules on either side of the cooling piping by walking into the inside of the "C" and opening the cabinet. The 7600 was an architectural landmark, and most of its features are still standard parts of computer design. It is a load-store computer with
3159-429: The data layout is in a known format — a set of numbers arranged sequentially in memory — the pipelines can be tuned to improve the performance of fetches. On the receipt of a vector instruction, special hardware sets up the memory access for the arrays and stuffs the data into the processor as fast as possible. CDC's approach in the STAR used what is today known as a memory-memory architecture . This referred to
3240-410: The distance that signals needed to travel. As the 6600 neared production quality, Cray lost interest in it and turned to designing its replacement. Making a machine "somewhat" faster would not be too difficult in the late 1960s; the introduction of integrated circuits allowed denser packing of components and, in turn, a higher clock speed. Transistors in general were also getting somewhat faster as
3321-417: The fetch of the target instruction of a conditional branch precedes evaluation of the branch condition. During the execution of a 10-word (up to 40 instruction) loop, all the needed instructions remain in the stack, so no instructions are fetched, leaving small core memory free for data transfers. There are eight 60-bit registers, each with an address register. Moving an address to an address register starts
Fujitsu VP2000 - Misplaced Pages Continue
3402-484: The first C compiler to their Cray-1 (non-vectorizing). This act would later give CRI a six-month head start on the Cray-2 Unix port to ETA Systems ' detriment, and Lucasfilm 's first computer generated test film, The Adventures of André & Wally B. . Application software generally tends to be either classified ( e.g. nuclear code, cryptanalytic code) or proprietary ( e.g. petroleum reservoir modeling). This
3483-443: The fourth (1983) and fifth (1986) World Computer Chess Championship , as well as the 1983 and 1984 North American Computer Chess Championship . The program, Chess , that dominated in the 1970s ran on Control Data Corporation supercomputers. Cray-1s are on display at the following locations: CDC 7600 The CDC 7600 was designed by Seymour Cray to be the successor to the CDC 6600 , extending Control Data 's dominance of
3564-422: The instruction from memory and decodes it, then it collects any additional information it needs, in this case the numbers b and c, and then finally runs the operation and stores the results. The end result is that the computer requires tens or hundreds of millions of cycles to carry out these operations. In the STAR, new instructions essentially wrote the loops for the user. The user told the machine where in memory
3645-401: The list of numbers was stored, then fed in a single instruction a(1..1000000) = addv b(1..1000000), c(1..1000000) . At first glance it appears the savings are limited; in this case the machine fetches and decodes only a single instruction instead of 1,000,000, thereby saving 1,000,000 fetches and decodes, perhaps one-fourth of the overall time. The real savings are not so obvious. Internally,
3726-497: The low scalar performance of the machine meant that after the switch had taken place and the machine was running scalar instructions, the performance was quite poor . The result was rather disappointing real-world performance, something that could, perhaps, have been forecast by Amdahl's law . Cray studied the failure of the STAR and learned from it . He decided that in addition to fast vector processing, his design would also require excellent all-around scalar performance. That way when
3807-490: The machine contained 1,662 modules in 113 varieties. Each cable between the modules was a twisted pair , cut to a specific length in order to guarantee the signals arrived at precisely the right time and minimize electrical reflection. Each signal produced by the ECL circuitry was a differential pair, so the signals were balanced. This tended to make the demand on the power supply more constant and reduce switching noise. The load on
3888-523: The machine could perform an addition of two numbers while simultaneously multiplying two others. However, any given instruction had to complete its trip through the unit before the next could be fed into it, which caused a bottleneck when the scheduler system ran out of instructions. Adding more functional units would not improve performance unless the scheduler was also greatly improved, especially in terms of allowing it to have more memory, so it could look through more instructions for ones that could be fed into
3969-428: The machine switched modes, it would still provide superior performance. Additionally he noticed that the workloads could be dramatically improved in most cases through the use of registers . Just as earlier machines had ignored the fact that most operations were being applied to many data points, the STAR ignored the fact that those same data points would be repeatedly operated on. Whereas the STAR would read and process
4050-438: The machine would break down at least once a day, and often four or five times. Acceptance at installation sites took years while the bugs were worked out, and while the machine generally sold well enough given its "high end" niche, it is unlikely the machine generated any sort of real profits for CDC. The successor CDC 8600 was never completed, and Seymour Cray went on to form his own company, Cray Research . One surviving 7600
4131-478: The main memory. These integrated circuits were supplied by Fairchild Semiconductor and Motorola . In all, the Cray-1 contained about 200,000 gates. ICs were mounted on large five-layer printed circuit boards , with up to 144 ICs per board. Boards were then mounted back to back for cooling (see below) and placed in twenty-four 28-inch-high (710 mm) racks containing 72 double-boards. The typical module (distinct processing unit) required one or two boards. In all
SECTION 50
#17327932769694212-406: The main scalar units of the processor ran at half the speed of the vector unit. According to Amdahl's Law computers tend to run at the speed of their slowest unit, and in this case unless the program spent most of its time in the vector units, the slower scalar performance would make it 1/2 the performance of a Cray-1 at the same speed. The reason for this seemingly odd "feature" is unclear. One of
4293-492: The major complaints about the earlier VP series was their limited memory bandwidth—while the machines themselves had excellent performance in the processors, they were often starved for data. For the VP2000 series this was addressed by adding a second load/store unit to the scalar units, doubling memory bandwidth. Several versions of the machines were sold at different price points. The low-end VP2100 ran at an 8 ns cycle time and delivered only 0.5 GFLOPS (about 4-8 times
4374-571: The minimal conversions ran roughly the same speed as the 370 to about 2 times its performance (mostly due to a larger exponent range on the Cray), but vectorization led to further increases between 2.5 and 10 times. In one example program, which performed an internal fast Fourier transform , performance improved from the IBM's 47 milliseconds to 3. The new machine was the first Cray design to use integrated circuits (ICs). Although ICs had been available since
4455-470: The next addition are already waiting to be added. In this way each functional unit works in "parallel", as well as the machine as a whole. The improvement in performance generally depends on the number of steps the unit takes to complete. For instance, the 6600's multiply unit took 10 cycles to complete an instruction, so by pipelining the units it could be expected to gain about 10 times the speed. Things are never that simple, however. Pipelining requires that
4536-445: The parallel units. That appeared to be a major problem. In order to solve this problem, Cray turned to the concept of an instruction pipeline . Each functional unit consisted of several sections that operated in turn, for instance, an addition unit might have circuitry dedicated to retrieving the operands from memory, then the actual math unit, and finally another to send the results back to memory. At any given instance only one part of
4617-454: The performance of a Cray), while the VP2200 and VP2400 decreased the cycle time to 4 ns and delivered between 1.25 and 2.5 GFLOPS peak. The high-end VP2600 ran at 3.2 ns and delivered 5 GFLOPS. All of the models came in the /10 versions with a single scalar processor, or the /20 with a second, while the 2200 and 2400 also came in a /40 configuration with four. Due to
4698-488: The power supplies and the cooling system. The Cray-1 was the first supercomputer to successfully implement the vector processor design. These systems improve the performance of math operations by arranging memory and registers to quickly perform a single operation on a large set of data. Previous systems like the CDC STAR-100 and ASC had implemented these concepts but did so in a way that seriously limited their performance. The Cray-1 addressed these problems and produced
4779-410: The power supply was so evenly balanced that Cray boasted that the power supply was unregulated. To the power supply, the entire computer system looked like a simple resistor. The high-performance ECL circuitry generated considerable heat, and Cray's designers spent as much effort on the design of the refrigeration system as they did on the rest of the mechanical design. In this case, each circuit board
4860-432: The production processes and quality improved. These sorts of improvements might be expected to make a machine twice as fast, perhaps as much as five times. However, as with the 6600 design, Cray set himself the goal of producing a machine with ten times the performance. One of the reasons the 6600 was so much faster than its contemporaries is that it had multiple functional units that could operate in parallel. For instance,
4941-489: The same memory five times to apply five vector operations on a set of data, it would be much faster to read the data into the CPU's registers once, and then apply the five operations. However, there were limitations with this approach. Registers were significantly more expensive in terms of circuitry, so only a limited number could be provided. This implied that Cray's design would have less flexibility in terms of vector sizes. Instead of reading any sized vector several times as in
SECTION 60
#17327932769695022-459: The seals and eventually coat the boards with oil until they shorted out. New welding techniques had to be used to properly seal the tubing. In order to bring maximum speed out of the machine, the entire chassis was bent into a large C-shape. Speed-dependent portions of the system were placed on the "inside edge" of the chassis, where the wire-lengths were shorter. This allowed the cycle time to be decreased to 12.5 ns (80 MHz), not as fast as
5103-480: The second one was installed at McDonnell Douglas in Huntington Beach, California . They were actual 7600s connected by chassis 25 to make them a 7600 MP. From about 1969 to 1975, the CDC 7600 was generally regarded as the fastest computer in the world, except for specialized units. However, even with the advanced mechanicals and cooling, the 7600 was prone to failure. Both LLNL and NCAR reported that
5184-431: The system added an optional second High Speed Data Channel. Users could add a Solid-state Storage Device with 8 to 32 million words of MOS RAM. In 1978, the first standard software package for the Cray-1 was released, consisting of three main products: The United States Department of Energy funded sites from Lawrence Livermore National Laboratory , Los Alamos Scientific Laboratory , Sandia National Laboratories and
5265-452: The time was the one at Los Alamos, but accessing this machine required Q clearance . The tests were reported in two ways. The first was a minimum conversion needed to get the program running without errors, but making no attempt to take advantage of the Cray's vectorization. The second included a moderate set of updates to the code, often unwinding loops so they could be vectorized. Generally,
5346-576: The time, the company was in serious financial trouble, and with the STAR in the pipeline as well, Norris could not invest the money. As a result, Cray left CDC and started Cray Research very close to the CDC lab. In the back yard of the land he purchased in Chippewa Falls , Cray and a group of former CDC employees started looking for ideas. At first, the concept of building another supercomputer seemed impossible, but after Cray Research's Chief Technology Officer travelled to Wall Street and found
5427-405: The transformations being applied are identical across all of the data points in the set. For instance, the program might add 5 to every number in a set of a million numbers. In simple computers the program would loop over all million numbers, adding five, thereby executing a million instructions saying a = add b, c . Internally the computer solves this instruction in several steps. First it reads
5508-400: The unit was active, while the rest waited their turn. A pipeline improves on this by feeding in the next instruction before the first has completed, using up that idle time. For instance, while one instruction is being added together, the operands for the next add instruction can be fetched. That way, as soon as the current instruction completes and moves to the output circuitry, the operands for
5589-432: The unit's internals can be effectively separated to the point where each step of the operation is running on completely separate circuitry. This is rarely achievable in the real world. Nevertheless, the use of pipelining on the 7600 improved performance over the 6600 by a factor of about 3. To achieve the rest of the goal, the machine would have to run at a faster speed, now possible using new transistor designs. However, there
5670-499: The way the machine gathered data. It set up its pipeline to read from and write to memory directly. This allowed the STAR to use vectors of length not limited by the length of registers, making it highly flexible. Unfortunately, the pipeline had to be very long in order to allow it to have enough instructions in flight to make up for the slow memory. That meant the machine incurred a high cost when switching from processing vectors to performing operations on non-vector operands. Additionally,
5751-427: Was 160 MFLOPS. (The reciprocal approximation unit could also operate in parallel, but did not deliver a true floating-point result - two additional multiplications were needed to achieve a full division.) Since the machine was designed to operate on large data sets, the design also dedicated considerable circuitry to I/O . Earlier Cray designs at CDC had included separate computers dedicated to this task, but this
5832-450: Was NSA..." The 160 MFLOPS Cray-1 was succeeded in 1982 by the 800 MFLOPS Cray X-MP , the first Cray multi-processing computer. In 1985, the very advanced Cray-2 , capable of 1.9 GFLOPS peak performance, succeeded the first two models but met a somewhat limited commercial success because of certain problems at producing sustained performance in real-world applications. A more conservatively designed evolutionary successor of
5913-415: Was a supercomputer designed, manufactured and marketed by Cray Research . Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, eighty Cray-1s were sold, making it one of the most successful supercomputers in history. It is perhaps best known for its unique shape, a relatively small C-shaped cabinet with a ring of benches around the outside covering
5994-570: Was because little software was shared between customers and university customers. The few exceptions were climatological and meteorological programs until the NSF responded to the Japanese Fifth Generation Computer Systems project and created its supercomputer centers. Even then, little code was shared. Partly because Cray were interested in the publicity, they supported the development of Cray Blitz which won
6075-432: Was essentially made up of four 7600s in a box with an additional special mode that allowed them to operate lock-step in a SIMD fashion. Jim Thornton, formerly Cray's engineering partner on earlier designs, had started a more radical project known as the CDC STAR-100 . Unlike the 8600's brute-force approach to performance, the STAR took an entirely different route. The main processor of the STAR had lower performance than
6156-529: Was given control. In this way the memory always held up-to-date information for the main processor to work on (barring delays in the external devices themselves), eliminating delays on data, as well as allowing the CPU to be built for mathematical performance and nothing else. The PPU could have been called a very smart "communications channel". Like the 6600, the 7600 used 60-bit words with instructions that were generally 15 bits in length, although there were also 30-bit instructions. The instructions were packed into
6237-503: Was no longer needed. Instead the Cray-1 included four six-channel controllers, each of which was given access to main memory once every four cycles. The channels were 16 bits wide and included three control bits and four bits for error correction, so the maximum transfer speed was one word per 100 ns, or 500 thousand words per second for the entire machine. The initial model, the Cray-1A , weighed 10,500 pounds (4,800 kg) including
6318-423: Was paired with a second, placed back to back with a sheet of copper between them. The copper sheet conducted heat to the edges of the cage, where liquid Freon running in stainless steel pipes drew it away to the cooling unit below the machine. The first Cray-1 was delayed six months due to problems in the cooling system; lubricant that is normally mixed with the Freon to keep the compressor running would leak through
6399-501: Was separated from the main machine, connected to the main system via a 6 Mbit/s control channel and a 100 Mbit/s High Speed Data Channel. This separation made the 1S look like two "half Crays" separated by a few feet, which allowed the I/O system to be expanded as needed. Systems could be bought in a variety of configurations from the S/500 with no I/O and 0.5 million words of memory to
6480-407: Was set up to allow the vector registers to be fed at one word per cycle, while the address and scalar registers required two cycles. In contrast, the entire 16-word instruction buffer could be filled in four cycles. The Cray-1 had twelve pipelined functional units. The 24-bit address arithmetic was performed in an add unit and a multiply unit. The scalar portion of the system consisted of an add unit,
6561-500: Was shown to be slightly faster than its IBM rival, the IBM System/360 , Model 195. When the system was released in 1967, it sold for around $ 5 million in base configurations, and considerably more as options and features were added. Among the 7600's notable state-of-the-art contributions, beyond extensive pipelining , was the physical C-shape, which both reduced floor space and dramatically increased performance by reducing
#968031