BBN Butterfly - Misplaced Pages

The BBN Butterfly was a massively parallel computer built by Bolt, Beranek and Newman in the 1980s. It was named for the "butterfly" multi-stage switching network around which it was built. Each machine had up to 512 CPUs , each with local memory, which could be connected to allow every CPU access to every other CPU's memory, although with a substantially greater latency (roughly 15:1) than for its own. The CPUs were commodity microprocessors. The memory address space was shared.

#153846

51-457: The first generation used Motorola 68000 processors, followed by a 68010 version. The Butterfly connect was developed specifically for this computer. The second or third generation, GP-1000 models used Motorola 68020 's and scaled to 256 CPUs. The later, TC-2000 models used Motorola MC88100 's, and scaled to 512 CPUs. The Butterfly was initially developed as the Voice Funnel , a router for

102-512: A finite number of operations it can support – for example, no FPUs directly support arbitrary-precision arithmetic . When a CPU is executing a program that calls for a floating-point operation that is not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on

153-889: A gate array to interface the ARM2 processor with the WE32206 to support the additional ARM floating-point instructions. Acorn later offered the FPA10 coprocessor, developed by ARM, for various machines fitted with the ARM3 processor. Coprocessors were available for the Motorola 68000 family , the 68881 and 68882 . These were common in Motorola 68020 / 68030 -based workstations , like the Sun-3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding

204-483: A much higher total pin count. By the early 1980s, similar limitations on all modern CPU designs led to the introduction of the pin grid array that replaced the DIP. For the new project, Motorola selected a 169-pin layout, giving them plenty of room to work with. The design ultimately used only 114 of them. A great debate broke out about how to refer to the underlying design of the new chip in marketing materials. Technically,

255-402: A multiprocessing system (which were removed in the 68060), some support for high-level languages which did not get used much (and was removed from future 680x0 processors), bigger multiply (32×32→64 bits) and divide (64÷32→32 bits quotient and 32 bits remainder) instructions, and bit field manipulations. The new addressing modes add scaled indexing and another level of indirection to many of

306-440: A multiprocessor system, coprocessors could not be shared between CPUs. To avoid problems with returns from coprocessor, bus error, and address error exceptions, it was generally necessary in a multiprocessor system for all CPUs to be the same model, and for all FPUs to be the same model as well. The new instructions include some minor improvements and extensions to the supervisor state, several instructions for software management of

357-402: A new fab, MOS-8, using 5-inch wafers and the latest HMOS process licensed from Intel . This line was capable of building all of the new techniques, but the 68000 went ahead with the older design as they were sure it would work. Moving to new design techniques would wait until the design was in the market. The conversion to the new design techniques took place during the Motorola 68010 effort,

408-486: A new watch band to commemorate the event. Meanwhile, Walker instituted a new policy at MOS-8 to improve the plant itself. He normally called meetings at 6:30 AM. If things were not going well, he would move that up to 5:30, and even 4:30. This provided a strong incentive to get the plant running. The production problems were soon ironed out, and volume deliveries began late that year. By this point, their workstation customers had already developed complete systems ready to use

459-686: A proper three-stage pipeline. Though the 68010 had a "loop mode", which sped loops through what was effectively a tiny instruction cache, it held only two short instructions and was thus little used. The 68020 replaced this with a proper instruction cache of 256 bytes, the first 68k series processor to feature true on-chip cache memory. The previous 68000 and 68010 processors could only access word (16-bit) and long word (32-bit) data in memory if it were word-aligned (located at an even address). The 68020 has no alignment restrictions on data access. Naturally, unaligned accesses are slower than aligned accesses because they required an extra memory access. The 68020 has

510-473: A relatively minor upgrade to the original design that added basic virtual memory support for the emerging Unix workstation market. As this effort was ongoing, Motorola was canvassing their customers for their desires for future developments in the line. These all pointed to a fully 32-bit implementation. Those using the 68k in Unix systems also stated they would purchase a floating point unit for every one of

561-761: A shared broadcast satellite channel and the BSAT became the Wideband Packet Switch (WPS). Another DARPA sponsored project at BBN produced the Butterfly Multiprocessor Internet Gateway (Internet Router) to interconnect different types of networks at the IP layer. Like the BSAT, the Butterfly Gateway broke the contention of a shared bus minicomputer architecture that had been in use for Internet Gateways by combining

SECTION 10

#1732791129154

612-462: A small 256-byte direct-mapped instruction cache, arranged as 64 four-byte entries. Although small, it still made a significant difference in the performance of many applications. The resulting decrease in bus traffic was particularly important in systems relying heavily on DMA . The 68020 has a coprocessor interface supporting up to eight coprocessors. The main CPU recognizes "F-line" instructions (with

663-432: A special FPU named FlexFPU, which uses simultaneous multithreading . Each physical integer core, two per module, is single-threaded, in contrast with Intel's Hyperthreading , where two virtual simultaneous threads share the resources of a single physical core. Some floating-point hardware only supports the simplest operations: addition, subtraction, and multiplication. But even the most complex floating-point hardware has

714-521: A symmetric multiprocessor. The largest configured system with 128 processors was at the University of Rochester Computer Science Department. Most delivered systems had about 16 processors. No known configurations appear to be in museums. At least one system is thought to be sitting within a DARPA autonomous vehicle. TotalView , the parallel program debugger developed for the Butterfly, outlived

765-639: Is a lower cost version of the Motorola 68020. The main difference is that the 68EC020 only has a 24-bit address bus, rather than the 32-bit address bus of the full 68020, and thus is only able to address 16 MB of memory. The Amiga 1200 computer and the Amiga CD32 game console use the cost-reduced 68EC020; the Namco System 22 , Taito F3 and Konami GX arcade boards also used this processor. The Atari Jaguar II prototype featured this to replace

816-525: Is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition , subtraction , multiplication , division , and square root . Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low, so some systems prefer to compute these functions in software. In general-purpose computer architectures , one or more FPUs may be integrated as execution units within

867-467: Is asynchronous, so it is possible to run the coprocessors at a different clock rate than the CPU. Multiprocessing support is implemented externally by the use of an RMC pin to indicate an indivisible read-modify-write cycle in progress. All other processors have to hold off memory accesses until the cycle is complete. Software support for multiprocessing includes the TAS , CAS and CAS2 instructions. In

918-603: Is available, the CORDIC methods are most commonly used for transcendental function evaluation. In most modern computer architectures, there is some division of floating-point operations from integer operations. This division varies significantly by architecture; some have dedicated floating-point registers, while some, like Intel x86 , go as far as independent clocking schemes. CORDIC routines have been implemented in Intel x87 coprocessors ( 8087 , 80287, 80387 ) up to

969-497: The 80287 , and 80386/80386SX -based machines – for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included Cyrix and Weitek . Acorn Computers opted for the WE32206 to offer single , double and extended precision to its ARM powered Archimedes range, introducing

1020-545: The 80486 microprocessor series, as well as in the Motorola 68881 and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts (and complexity) of the FPU subsystem. Floating-point operations are often pipelined . In earlier superscalar architectures without general out-of-order execution , floating-point operations were sometimes pipelined separately from integer operations. The modular architecture of Bulldozer microarchitecture uses

1071-677: The Alpha Microsystems AM-2000. The 68020 was an alternative upgrade to the Sinclair QL 's 68008 in the Super Gold Card interface by Miracle Systems . The Amiga 2500 and A2500UX optionally shipped with the A2620 Accelerator using a 68020, 68881 FPU and 68851 MMU. The 2500UX shipped with Amiga Unix , requiring an '020 or '030 processor. A number of digital oscilloscopes from the mid-80s to

SECTION 20

#1732791129154

1122-580: The IBM 701 . This was carried forward to its successors the 709, 7090, and 7094. In 1963, Digital announced the PDP-6 , which had floating point as a standard feature. In 1963, the GE-235 featured an "Auxiliary Arithmetic Unit" for floating point and double-precision calculations. Historically, some systems implemented floating point with a coprocessor rather than as an integrated unit (but now in addition to

1173-567: The Macintosh IIx replaced it, using the 030. Although it ran at the same 16 MHz clock speed, the IIx offered 3.9 MIPS compared to the II's 2.6. The 68020 has 32-bit internal and external data and address buses, compared to the early 680x0 models with 16-bit data and 24-bit address buses. The 68020's ALU is also natively 32-bit, so can perform 32-bit operations in one clock cycle, whereas

1224-420: The central processing unit ; however, many embedded processors do not have hardware support for floating-point operations (while they increasingly have them as standard). When a CPU is executing a program that calls for a floating-point operation, there are three ways to carry it out: In 1954, the IBM 704 had floating-point arithmetic as a standard feature, one of its major improvements over its predecessor

1275-526: The 020 and 030, the latter of which could be used as a drop-in replacement in many roles. For this reason, designs using the 030 appeared much more quickly after its release than the 020. The first Macintosh with the 020 was the Macintosh II released in March 1987, two years after the 020 had become widely available, with low-volume initial shipments starting two months later. Only eighteen months later,

1326-426: The 020 and the new floating point unit, the Motorola 68881 . Systems were in the market only five or six months after the 020 had been announced. Design of the 020's follow-on began almost immediately. As part of their ongoing work with Hitachi, Motorola's fabrication system was finally catching up with the competition, as was their internal design workflow. This gave them considerably more room to work with, allowing

1377-460: The 020 was moving from the long-established NMOS logic design to a CMOS layout, which requires two transistors per gate. Common knowledge of the era suggested that CMOS cost four times as much as NMOS, and there was a significant amount of the market that believed "CMOS equals bad." The design was completed in the summer of 1983 and announced in June 1984. This "super chip" was significant news at

1428-650: The 68000 of the original Atari Jaguar console. It also found use in laser printers. Apple used it in the LaserWriter IIɴᴛx. Kodak used it in the Ektaplus 7016PS, and Dataproducts used it in the LZR 1260. In 2014, Rochester Electronics re-established manufacturing capability for the 68020 microprocessor and it is still available today. Floating point unit A floating-point unit ( FPU ), numeric processing unit ( NPU ), colloquially math coprocessor ,

1479-530: The 68000 took a minimum of two clock cycles due to its 16-bit ALU. Newer packaging methods allowed the '020 to feature more external pins without the large size that the earlier dual in-line package method required. The 68EC020 lowered cost through a 24-bit address bus. The 68020 was produced at speeds ranging from 12 MHz to 33 MHz. The 68020 has a 32-bit arithmetic logic unit (ALU), 32-bit external data and address buses. It adds extra instructions and additional addressing modes. The 68020 (and 68030) has

1530-543: The 68020, together with a 68882 FPU. It is also the processor used on TGV trains to decode signalling information sent to the trains through the rails. It is used in the flight control and radar systems of the Eurofighter Typhoon combat aircraft. The Nortel Networks DMS-100 telephone central office switch also used the 68020 as the first microprocessor of the SuperNode computing core. The 68EC020

1581-469: The CPU, e.g. GPUs – that are coprocessors not always built into the CPU ;– have FPUs as a rule, while first generations of GPUs did not). This could be a single integrated circuit , an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating-point calculations are done in software, which takes more processor time, but avoids

BBN Butterfly - Misplaced Pages Continue

1632-691: The ST-II protocol intended for carrying voice and video over IP networks. The Butterfly hardware was later used for the Butterfly Satellite IMP (BSAT) packet switch of DARPA's Wideband Packet Satellite Network which operated at multiple sites around the US over a shared 3 Mbit/s broadcast satellite channel. In the late 1980s, this network became the Terrestrial Wideband Network , based on terrestrial T1 circuits instead of

1683-598: The addition of larger processor caches , a built-in memory management unit (MMU) and other features. The Motorola 68030 was announced in September 1986, with deliveries to begin the next summer. Due to the changes in the production lines, the new 030 would have a lower launch price than the 020. There were significant differences between the 68000 and 020, especially the 32-bit memory interface. This required computer designs using it to be considerably different from earlier models. In contrast, there were few changes between

1734-434: The cost of the extra hardware. For a particular computer architecture, the floating-point unit instructions may be emulated by a library of software functions; this may permit the same object code to run on systems with or without floating-point hardware. Emulation can be implemented on any of several levels: in the CPU as microcode , as an operating system function, or in user-space code. When only integer functionality

1785-514: The execution of those instructions. In the 1980s, it was common in IBM PC /compatible microcomputers for the FPU to be entirely separate from the CPU , and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs. The IBM PC, XT , and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286 -based systems were generally socketed for

1836-431: The four most significant opcode bits all one), and uses special bus cycles to interact with a coprocessor to execute these instructions. Two types of coprocessors were defined: floating point units ( MC68881 or MC68882 FPUs ) and the paged memory management unit ( MC68851 PMMU). Only one PMMU can be used with a CPU. In principle, multiple FPUs could be used with a CPU, but it was not commonly done. The coprocessor interface

1887-459: The integer arithmetic logic unit . The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library . In some cases, FPUs may be specialized, and divided between simpler floating-point operations (mainly addition and multiplication) and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode , while

1938-573: The late-90s used the 68020, including the LeCroy 9300 Series (higher end models including "C" suffix models used the more powerful 68EC030 ; the 9300 models with a 68020 processor can be upgraded to the 68EC030 with a change of the CPU board ) and the earlier LeCroy 9400 series (all models excluding the 9400/9400A which used the 68000 ), along with certain Tektronix TDS Series models. The HP 54520, 54522, 54540 and 54542 also use

1989-487: The machines if one was available. The original 68000 had been designed as a hybrid 16/32-bit system largely because the maximum number of pins available on dual inline packages (DIPs) was 64, and even at that size, packaging of this size was highly problematic. By reducing the number of address pins to 24, and the data pins to only 16, there were enough free pins to implement all the other needed lines, like interrupts and power supplies. The 24-pin address bus meant that

2040-531: The memory could only be 16 MB in total, which was at this point becoming a limitation. The 16-bit data bus meant reading a 32-bit word from that memory required two bus cycles. A design that had 32 pins for both the address and data busses would access data twice as fast, making the machine that much faster even with no other changes. Moving to 32 bit addressing would also make the implementation of virtual memory easier, and allow for more than 16 MB of random access memory . But doing so would also demand

2091-922: The more complex operations are implemented as software. In some current architectures, the FPU functionality is combined with SIMD units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors. Several models of the PDP-11 , such as the PDP-11/45, PDP-11/34a, PDP-11/44, and PDP-11/70, supported an add-on floating-point unit to support floating-point instructions. The PDP-11/60, MicroPDP-11/23 and several VAX models could execute floating-point instructions without an add-on FPU (the MicroPDP-11/23 required an add-on microcode option), and offered add-on accelerators to further speed

BBN Butterfly - Misplaced Pages Continue

2142-409: The platform and was ported to a number of other massively parallel machines. 68020 The Motorola 68020 is a 32-bit microprocessor from Motorola , released in 1984. A lower-cost version was also made available, known as the 68EC020 . In keeping with naming practices common to Motorola designs, the 68020 is usually referred to as the "020", pronounced "oh-two-oh" or "oh-twenty". The 020

2193-460: The pre-existing modes. With full 32-bit internal and external address buses, the address registers (A0 through A7) could utilize their full 32-bit width, and were capable of addressing the entire 4 GB address space. The larger effective widths of the address registers presented some problem for earlier software that was not considered " 32-bit clean ". Some programs used the high 8 bits (bits 24-31) of addresses to contain various flag bits, with

2244-437: The routing computations and I/O at the network interfaces and using the Butterfly's switch fabric to provide the network interconnections. This resulted in significantly higher link throughputs. The Butterfly began with a proprietary operating system called Chrysalis, but moved to a Mach kernel operating system in 1989. While the memory access time was non-uniform, the machine had SMP memory semantics, and could be operated as

2295-411: The same MOS-8 factory as the 68000, although several new pieces of equipment were introduced to support it. By the time of the public release, the yield for the new chip was zero. That is, for every wafer sent through the multi-step process, zero working chips would be produced. Gary Johnson concluded the problem was the floor manager of MOS-8, Tom Felesi, and decided to replace him with Bill Walker, who

2346-667: The time, with the New York Times making it a lead story in their business section. The launch price was quoted at $ 487 each, about the same as the 68000 when it was launched in 1980, but the 68000 was now available for about $ 15. However, it was understood that it would be some time before computers using the new chip would be available, as existing designs would have to be heavily modified to take advantage of its performance. The announcement led to Motorola's customers clamouring for supply. At this point, serious supply problems became evident. The design had been laid out to be built in

2397-620: The understanding that the earlier 680x0 CPUs would safely ignore these high bits. Such software had to be rewritten to adjust to the larger physical address space available to the 68020 and later CPUs. The 68020 was used in the Apple Macintosh II and Macintosh LC personal computers , Sun-3 workstations, Amiga 1200 (68EC020 variant), the Hewlett-Packard 8711 Series Network Analyzers, HP 9000 /320, HP 9000/330, Apollo Computer 's DN3000 and DN4000 workstations, and

2448-483: Was a new piece of equipment from a new vendor, Genius, which produced silicide . The machine simply didn't work. Walker flew to California to meet with the CEO of Genius, who offered up nothing but excuses. Walker eventually slammed his hand down on the desk, breaking his watch band, and stated "No more excuses! I want this thing fixed now, today!" Genius took the demand seriously and fixed the machine. The CEO later sent Walker

2499-456: Was at that time running the older MOS-2 factory. Walker arrived at the plant on 5 July 1985 to find Johnson had not bothered to tell Felesi of the change, and arguments followed. Johnson eventually told Felesi this was indeed happening. Walker then toured the plant and found it had been turned into what was essentially a research and development lab, not a production line, with numerous bits of machinery in use nowhere else. One significant issue

2550-535: Was in the market for a relatively short time. The Motorola 68030 was announced in September 1986 and began deliveries in the summer of 1987. Priced about the same as the 020 of the time, the 030 was significantly faster and quickly replaced in 020 in almost every use. At the time the Motorola 68000 was designed, Motorola's design and fabrication services were outdated. Although even small companies like MOS Technology and Zilog had moved on to silicon gate depletion mode NMOS logic on ever-larger wafers , Motorola

2601-467: Was still using metal gates and enhancement mode and their largest fab worked on 4-inch wafers long after most lines had moved to 5-inch. Although the 68000 met the goal of being the fastest CPU available when it was introduced, it was not nearly as powerful as it could be if it had been designed with more modern techniques. During the period of the 68000 design, the company was working with Hitachi on their process technology and as part of this they opened

SECTION 50

#1732791129154
#153846