AMD Am29000 - Misplaced Pages

The AMD Am29000 , commonly shortened to 29k , is a family of 32-bit RISC microprocessors and microcontrollers developed and fabricated by Advanced Micro Devices (AMD). Based on the seminal Berkeley RISC , the 29k added a number of significant improvements. They were, for a time, the most popular RISC chips on the market, widely used in laser printers from a variety of manufacturers.

#626373

90-566: Developed since 1984–1985, announced in March 1987 and released in May 1988, the initial Am29000 was followed by several versions, ending with the Am29040 in 1995. The 29050 was notable for being early to feature a floating point unit capable of executing one multiply–add operation per cycle. AMD was designing a superscalar version until late 1995, when AMD dropped the development of the 29k because

180-441: A 1 KB tiny page. ARM uses a two-level page table if using 4 KB and 64 KB pages, or just a one-level page table for 1 MB sections and 16 MB sections. TLB updates are performed automatically by page table walking hardware. PTEs include read/write access permission based on privilege, cacheability information, an NX bit , and a non-secure bit. DEC Alpha processors divide memory into 8 KB , 16 KB , 32 KB , or 64 KB ;

270-424: A page table , containing one page table entry (PTE) per virtual page, to map virtual page numbers to physical page numbers in main memory. Multi-level page tables are often used to reduce the size of the page table. An associative cache of PTEs is called a translation lookaside buffer (TLB) and is used to avoid the necessity of accessing the main memory every time a virtual address is mapped. Other MMUs may have

360-616: A 0.7-micron process. A superscalar version of 29K was being designed, but canceled in favor of x86. It was codenamed Jaguar , and was described in November 1994 and August 1995. It was an advanced design, capable of four-way dispatch into six reservation stations and speculative out-of-order execution of instructions, with four-way retire. The register file permitted four reads and two writes at once. The caches for instructions and data were 8 KB each. Loads from cache could bypass stores . It had no on-chip FPU due to cost reasons and

450-426: A 16-bit address that made it too small as memory sizes increased in the 1970s. This was addressed by expanding the physical memory bus to 18-bits, and using an MMU to add two more bits based on other pins on the processor bus to indicate which program was accessing memory. Another use of this same technique, although not referred to as paging but bank switching , was widely used by early 8-bit microprocessors like

540-408: A TLB exception occurs when processing a TLB exception, a double fault TLB exception, it is dispatched to its own exception handler . MIPS32 and MIPS32r2 support 32 bits of virtual address space and up to 36 bits of physical address space. MIPS64 supports up to 64 bits of virtual address space and up to 59 bits of physical address space. The original Sun-1 is a single-board computer built around

630-646: A context is 1024 pages or 2 MB. The maximum physical address that can be mapped simultaneously is also 2 MB. The context register is important in a multitasking operating system because it allows the CPU to switch between processes without reloading all the translation state information. The 4-bit context register can switch between 16 sections of the segment map under supervisor control, which allows 16 contexts to be mapped concurrently. Each context has its own virtual address space. Sharing of virtual address space and inter-context communications can be provided by writing

720-462: A contiguous series of fixed-sized blocks. This is similar to the modern demand paging system in that the result is a series of pages, but in these earlier systems the list of pages is fixed in size and normally stored in some form of fast memory like static RAM to improve performance. In this case, the two parts of the address stored by the MMU are known as the segment number and page index . Consider

810-612: A degree of flexibility in the choice of memory technologies used to retain program instructions and data. The 29k saw some use as a computational accelerator or coprocessor, particularly on the Macintosh and IBM PC-compatible platforms. For instance, Yarc Systems Corporation produced 29k-based "RISC coprocessor" cards for Macintosh II and PC AT systems, alongside other "CISC coprocessor" cards featuring Motorola 68020 and 68030 processors, and "parallel coprocessor" cards featuring T800 transputer processors. Its NuSuper (originally named

900-461: A fault on write bit. The MIPS architecture supports one to 64 entries in the TLB. The number of TLB entries is configurable at CPU configuration before synthesis. TLB entries are dual. Each TLB entry maps a virtual page number (VPN2) to either one of two page frame numbers (PFN0 or PFN1), depending on the least significant bit of the virtual address that is not part of the page mask . This bit and

990-463: A finite number of operations it can support – for example, no FPUs directly support arbitrary-precision arithmetic . When a CPU is executing a program that calls for a floating-point operation that is not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on

SECTION 10

#1732773043627

1080-442: A fixed set of blocks instead of loading them on demand. The difference between these two approaches is the size of the contiguous block of memory; paged systems break up main memory into a series of equal sized blocks, while segmented systems generally allow for variable sizes. Early memory management systems, often implemented in software, set aside a portion of memory to hold a series of mappings. These consisted of pairs of values,

1170-889: A gate array to interface the ARM2 processor with the WE32206 to support the additional ARM floating-point instructions. Acorn later offered the FPA10 coprocessor, developed by ARM, for various machines fitted with the ARM3 processor. Coprocessors were available for the Motorola 68000 family , the 68881 and 68882 . These were common in Motorola 68020 / 68030 -based workstations , like the Sun-3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding

1260-418: A list of page number originally expressed by the program and the actual page number in main memory. When it attempts to access memory, the MMU reads the segment number from the processor's memory bus, finds the corresponding entry for that program in its internal memory, and expresses the mapped version of the value on the memory's bus while the lower bits of the original address are passed through unchanged. Like

1350-433: A page table entry or other per-page information prohibits access to a particular virtual page, perhaps because no physical random-access memory (RAM) has been allocated to that virtual page. In this case, the MMU signals a page fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying to find a spare frame of RAM and set up the page map to map it to the requested virtual address. If no RAM

1440-438: A private array of memory, registers, or static RAM that holds a set of mapping information. The virtual page number may be directly used as an index into the page table or other mapping information, or it may be further divided, with bits at a given level used as an index into a table of lower-level tables into which bits at the next level down are used as an index, with two or more levels of indexing. The physical page number

1530-481: A processor design with 24-bit addressing, like the original Motorola 68000 . In such a system, the MMU splits the virtual address into parts, for instance, the 13 least significant bits for the page index and the remaining 11 most significant bits as the segment number. This results a list of 2048 pages of 8 kB each. In this approach, memory requests result in one or more pages being granted to that program, which may not be contiguous in main memory. The MMU maintains

1620-440: A program to access memory it has not previously requested, which prevents a misbehaving program from using up all memory or malicious code from reading data from another program. They also often manage a processor cache , which stores recently accessed data in a very fast memory and thus reduces the need to talk to the slower main memory. In some implementations, they are also responsible for bus arbitration , controlling access to

1710-507: A request, but this is spread out and cannot be allocated. On systems where programs start and stop over time, this can eventually lead to memory being highly fragmented and no large blocks remaining. A number of algorithms were developed to address this problem. Segmenting was widely used on microcomputer platforms of the 1980s. Among the MMUs that used this concept were the Motorola 68451 and Signetics 68905, but many other examples exist. It

1800-785: A set of bits to index the root level of the tree, a set of bits to index the middle level of the tree, a set of bits to index the leaf level of the tree, and remaining bits that pass through to the physical address without modification, indexing a byte within the page. The sizes of the fields are dependent on the page size; all three tree index fields are the same size. The OpenVMS AXP PALcode supports full read and write permission bits for user, supervisor, executive, and kernel modes, and also supports fault on read/write/execute bits are also supported. The DEC OSF/1 PALcode supports full read and write permission bits for user and kernel modes, and also supports fault on read/write/execute bits are also supported. The Windows NT AXP PALcode can either walk

1890-473: A set of bits to index the root level of the tree, a set of bits to index the top level of the tree, a set of bits to index the leaf level of the tree, and remaining bits that pass through to the physical address without modification, indexing a byte within the page. The sizes of the fields are dependent on the page size. The Windows NT AXP PALcode supports a page being accessible only from kernel mode or being accessible from user and kernel mode, and also supports

SECTION 20

#1732773043627

1980-427: A single-level page table in a virtual address space or a two-level page table in physical address space. The upper 32 bits of an address are ignored. For a single-level page table, addresses are broken down into a set of bits to index the page table and remaining bits that pass through to the physical address without modification, indexing a byte within the page. For a two-level page table, addresses are borken down into

2070-432: A special FPU named FlexFPU, which uses simultaneous multithreading . Each physical integer core, two per module, is single-threaded, in contrast with Intel's Hyperthreading , where two virtual simultaneous threads share the resources of a single physical core. Some floating-point hardware only supports the simplest operations: addition, subtraction, and multiplication. But even the most complex floating-point hardware has

2160-487: A uniform fashion. The MMU is implemented in hardware on the CPU board. The MMU consists of a context register, a segment map and a page map. Virtual addresses from the CPU are translated into intermediate addresses by the segment map, which in turn are translated into physical addresses by the page map. The page size is 2 KB and the segment size is 32 KB which gives 16 pages per segment. Up to 16 contexts can be mapped concurrently. The maximum logical address space for

2250-438: Is a computer hardware unit that examines all memory references on the memory bus , translating these requests, known as virtual memory addresses , into physical addresses in main memory . In modern systems, programs generally have addresses that access the theoretical maximum memory of the computer architecture , 32 or 64 bits. The MMU maps the addresses from each program into separate areas in physical memory, which

2340-525: Is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition , subtraction , multiplication , division , and square root . Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low, so some systems prefer to compute these functions in software. In general-purpose computer architectures , one or more FPUs may be integrated as execution units within

2430-603: Is available, the CORDIC methods are most commonly used for transcendental function evaluation. In most modern computer architectures, there is some division of floating-point operations from integer operations. This division varies significantly by architecture; some have dedicated floating-point registers, while some, like Intel x86 , go as far as independent clocking schemes. CORDIC routines have been implemented in Intel x87 coprocessors ( 8087 , 80287, 80387 ) up to

2520-454: Is combined with the page offset to give the complete physical address. A page table entry or other per-page information may also include information about whether the page has been written to (the dirty bit ), when it was last used (the accessed bit , for a least recently used (LRU) page replacement algorithm ), what kind of processes ( user mode or supervisor mode ) may read and write it, and whether it should be cached . Sometimes,

2610-552: Is free, it may be necessary to choose an existing page (known as a victim ), using some replacement algorithm , and save it to disk (a process called paging ). With some MMUs, there can also be a shortage of PTEs, in which case the OS will have to free one for the new mapping. The MMU may also generate illegal access error conditions or invalid page faults upon illegal or non-existing memory accesses, respectively, leading to segmentation fault or bus error conditions when handled by

2700-432: Is generally much smaller than the theoretical maximum. This is possible because programs rarely use large amounts of memory at any one time. Most modern operating systems (OS) work in concert with an MMU to provide virtual memory (VM) support. The MMU tracks memory use in fixed-size blocks known as pages , and if a program refers to a location in a page that is not in physical memory, the MMU will cause an interrupt to

2790-446: Is one of the benefits of paging . However, paged mapping causes another problem, internal fragmentation . This occurs when a program requests a block of memory that does not cleanly map into a page, for instance, if a program requests a 1 KB buffer to perform file work. In this case, the request results in an entire page being set aside even though only 1 KB of the page will ever be used; if pages are larger than 1 KB,

AMD Am29000 - Misplaced Pages Continue

2880-399: Is very small. An OS may treat multiple pages as if they were a single larger page. For example, Linux on VAX groups eight pages together. Thus, the system is viewed as having 4 KB pages. The VAX divides memory into four fixed-purpose regions, each 1 GB in size. They are: Page tables are big linear arrays. Normally, this would be very wasteful when addresses are used at both ends of

2970-497: The 80287 , and 80386/80386SX -based machines – for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included Cyrix and Weitek . Acorn Computers opted for the WE32206 to offer single , double and extended precision to its ARM powered Archimedes range, introducing

3060-545: The 80486 microprocessor series, as well as in the Motorola 68881 and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts (and complexity) of the FPU subsystem. Floating-point operations are often pipelined . In earlier superscalar architectures without general out-of-order execution , floating-point operations were sometimes pipelined separately from integer operations. The modular architecture of Bulldozer microarchitecture uses

3150-492: The Am29050 had also become available, without on-chip cache but featuring a floating-point unit with fully pipelined multiply–accumulate operations , a larger 1 KB Branch Target Cache with a claimed 80% hit rate, and better-pipelined load operations sped up by a 4-entry TLB -like Physical Address Cache. Though it is not a superscalar processor, it permits a floating-point operation and an integer operation to complete at

3240-580: The IBM 701 . This was carried forward to its successors the 709, 7090, and 7094. In 1963, Digital announced the PDP-6 , which had floating point as a standard feature. In 1963, the GE-235 featured an "Auxiliary Arithmetic Unit" for floating point and double-precision calculations. Historically, some systems implemented floating point with a coprocessor rather than as an integrated unit (but now in addition to

3330-593: The MOS 6502 . For instance, the Atari MMU would express additional bits on the address bus to select among several banks of DRAM memory based on which of the chips was currently active, normally the CPU or ANTIC . This was used to expand the available memory on the Atari 130XE to 128 kB. The Commodore 128 used a similar approach. Most modern systems divide memory into pages that are 4–64 KB in size, often with

3420-745: The McCray ) and AT-Super cards, employing the Am29000 CPU and Am29027 floating-point accelerator, were followed by the MacRageous , upgrading the CPU to the Am29050. Such accelerator cards offered performance several times that of the Macintosh II itself and benchmarked competitively with RISC workstations such as the DECstation 3100. Multiple cards could also be fitted to a system. However,

3510-495: The Motorola 68000 microprocessor and introduced in 1982. It includes the original Sun 1 memory management unit that provides address translation, memory protection, memory sharing and memory allocation for multiple processes running on the CPU. All access of the CPU to private on-board RAM, external Multibus memory, on-board I/O and the Multibus I/O runs through the MMU, where address translation and protection are done in

3600-541: The Zilog Z8000 family of processors. Later microprocessors (such as the Motorola 68030 and the Zilog Z280 ) placed the MMU together with the CPU on the same integrated circuit, as did the Intel 80286 and later x86 microprocessors. While this article concentrates on modern MMUs, commonly based on demand paging, early systems used base and bounds addressing that further developed into segmentation , or used

3690-494: The base and limit , although many other terms have been used. When the operating system requested memory to load a program, or a program requested more memory to hold data from a file for instance, it would call the memory handling library . This examined the mappings to look for an area in main memory large enough to hold the request. If such a block was found, a new entry was entered into the table. From then on, when that program accessed memory, all of its addresses were offset by

AMD Am29000 - Misplaced Pages Continue

3780-420: The central processing unit ; however, many embedded processors do not have hardware support for floating-point operations (while they increasingly have them as standard). When a CPU is executing a program that calls for a floating-point operation, there are three ways to carry it out: In 1954, the IBM 704 had floating-point arithmetic as a standard feature, one of its major improvements over its predecessor

3870-471: The operating system . The OS will then select a lesser-used block in memory, write it to backing storage such as a hard drive if it has been modified since it was read in, read the page from backing storage into that block, and set up the MMU to map the block to the originally requested page so the program can use it. This is known as demand paging . Modern MMUs generally perform additional memory-related tasks as well. Memory protection blocks attempts by

3960-457: The 1980s. This problem can be reduced by making the pages larger, say 64 kB instead of 8. Now the page index uses 16 bits and the resulting page table is 64 kB, which is more tractable. Moving to a larger page size leads to the second problem, increased internal fragmentation. A program that generates a series of requests for small block will be assigned large blocks and thereby waste large amounts of memory. The paged translation approach

4050-554: The 29000 has a single branch delay slot . The buffer mitigated this by storing four or two instructions from the target address of the branch, which could be run instantly while the fetch buffer was re-filled with new instructions from memory. Support for virtual address translation followed a similar approach to that of the MIPS architecture. A 64-entry translation lookaside buffer (TLB) retained mappings from virtual to physical addresses, and upon an untranslated address being encountered,

4140-408: The 29000 was used in a variety of products such as X terminals, laser printer controller cards, graphics accelerator cards, optical character recognition solutions, and network bridges. The memory architecture of the 29000 was a particular attraction for product designers, allowing them to forego external cache memory and to employ dynamic RAM directly while maintaining acceptable performance, permitting

4230-469: The CPU, e.g. GPUs – that are coprocessors not always built into the CPU ;– have FPUs as a rule, while first generations of GPUs did not). This could be a single integrated circuit , an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating-point calculations are done in software, which takes more processor time, but avoids

4320-558: The CPU, with four processor registers holding base values accessed directly by the program. These mapped only the upper 4 bits of the 20-bit address, and there was no equivalent of a limit, which was simply the lower 16-bits of the address and thus a fixed 64 kB. Later entries in the x86 architecture series used different approaches. Some systems, such as the GE 645 and its successors, used both segmentation and paging. The table of segments, instead of containing per-segment entries giving

4410-464: The MMU when trapping into OS code. The IBM System/360 Model 67 , which was introduced August, 1965, included an MMU called a dynamic address translation (DAT) box. It has the unusual feature of storing accessed and dirty bits outside of the page table (along with the four bit protection key for all S/360 processors). They refer to physical memory rather than virtual memory, and are accessed by special-purpose instructions. This reduces overhead for

4500-520: The OS, which would otherwise need to propagate accessed and dirty bits from the page tables to a more physically oriented data structure. This makes OS-level virtualization , later called paravirtualization , easier. Starting in August, 1972, the IBM System/370 has a similar MMU, although it initially supported only a 24-bit virtual address space rather than the 32-bit virtual address space of

4590-532: The System/360 Model 67. It also stores the accessed and dirty bits outside the page table. In early 1983, the System/370-XA architecture expanded the virtual address space to 31 bits, and in 2000, the 64-bit z/Architecture was introduced, with the address space expanded to 64 bits; those continue to store the accessed and dirty bits outside the page table. VAX pages are 512 bytes, which

SECTION 50

#1732773043627

4680-454: The accessed bit if they are to operate efficiently. Typically, the OS will periodically unmap pages so that page-not-present faults can be used to let the OS set an accessed bit. ARM architecture -based application processors implement an MMU defined by ARM's virtual memory system architecture. The current architecture defines PTEs for describing 4 KB and 64 KB pages, 1 MB sections and 16 MB super-sections; legacy versions also defined

4770-411: The base value. When the program is done with the memory it requested and releases, or the program exits, the entries associated with it are released. This style of access, over time, became common in the mainframe market and was known as segmented translation , although a variety of terms are used here as well. This style has the advantage of simplicity; the memory blocks are continuous and thus only

4860-471: The calls would be pushed off the end of the register stack into memory, restored as required when the routine returned. Generally, the 29000's register usage was considerably more advanced than competing designs based on the Berkeley concepts. Another difference with the Berkeley design is that the 29000 included no special-purpose condition code register. Any register could be used for this purpose, allowing

4950-402: The capability to use so-called huge pages of 2 MB or 1 GB in size (often both variants are possible). Page translations are cached in a translation lookaside buffer (TLB). Some systems, mainly older RISC designs, trap into the OS when a page translation is not found in the TLB. Most systems use a hardware-based tree walker. Most systems allow the MMU to be disabled, but some disable

5040-409: The conditions to be easily saved at the expense of complicating some code. A Branch Target Cache (512 bytes on the 29000 and 1024 bytes on the 29050) stored sets of 4 or 2 sequential instructions found at the branch target address, reducing the instruction fetch latency during taken branches—the 29000 did not include any branch prediction system so there was a delay if a branch was taken. It means

5130-582: The coprocessor were not as common in lower-end systems. There are also add-on FPU coprocessor units for microcontroller units (MCUs/μCs)/ single-board computer (SBCs), which serve to provide floating-point arithmetic capability. These add-on FPUs are host-processor-independent, possess their own programming requirements ( operations , instruction sets , etc.) and are often provided with their own integrated development environments (IDEs). Memory management unit A memory management unit ( MMU ), sometimes called paged memory management unit ( PMMU ),

5220-479: The cost of a Macintosh II system combined with such a card approached that of established RISC workstations running Unix. The AT-Super was priced at around $ 4,600 and was reported as running Unix, competing with similar products employing Intel's i860 processor. One notable product utilising the 29k was Apple's Macintosh Display Card 8·24 GC for its Macintosh IIfx , featuring a 30 MHz Am29000 processor, 64 KB static RAM cache, and 2 MB of video RAM, with

5310-434: The cost of the extra hardware. For a particular computer architecture, the floating-point unit instructions may be emulated by a library of software functions; this may permit the same object code to run on systems with or without floating-point hardware. Emulation can be implemented on any of several levels: in the CPU as microcode , as an operating system function, or in user-space code. When only integer functionality

5400-518: The design team was transferred to support the PC ( x86 ) side of the business. What remained of AMD's embedded business was realigned towards the embedded 186 family of 80186 derivatives. By then the majority of AMD's resources were concentrated on their high-performance x86 processors for desktop PCs, using many of the ideas and individual parts of the 29k designs to produce the AMD K5 . The 29k evolved from

5490-562: The execution of those instructions. In the 1980s, it was common in IBM PC /compatible microcomputers for the FPU to be entirely separate from the CPU , and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs. The IBM PC, XT , and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286 -based systems were generally socketed for

SECTION 60

#1732773043627

5580-452: The installed memory. Another common technique, found mostly on larger machines, was segmented translation, which allowed for variable-size blocks of memory that better mapped onto program requests. This was efficient but did not map as well onto virtual memory. Some early systems, especially 8-bit systems, used very simple MMUs to perform bank switching . Modern MMUs typically divide the virtual address space (the range of addresses used by

5670-459: The integer arithmetic logic unit . The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library . In some cases, FPUs may be specialized, and divided between simpler floating-point operations (mainly addition and multiplication) and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode , while

5760-416: The memory bus among the many parts of the computer that desire access. Prior to VM systems becoming widespread in the 1990s, earlier MMU designs were more varied. Common among these was paged translation, which was similar to modern demand paging in that it used fixed-size blocks, but had a fixed-size list of pages that divided up memory; this meant that the block size was a function of the number of pages and

5850-922: The more complex operations are implemented as software. In some current architectures, the FPU functionality is combined with SIMD units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors. Several models of the PDP-11 , such as the PDP-11/45, PDP-11/34a, PDP-11/44, and PDP-11/70, supported an add-on floating-point unit to support floating-point instructions. The PDP-11/60, MicroPDP-11/23 and several VAX models could execute floating-point instructions without an add-on FPU (the MicroPDP-11/23 required an add-on microcode option), and offered add-on accelerators to further speed

5940-432: The operating system. In some cases, a page fault may indicate a software bug , which can be prevented by using memory protection as one of key benefits of an MMU: an operating system can use it to protect against errant programs by disallowing access to memory that a particular program should not have access to. Typically, an operating system assigns each program its own virtual address space. A paged MMU also mitigates

6030-475: The option of an additional 2 MB of dynamic RAM for use by the QuickDraw graphical toolkit. The inclusion of the 29k differentiated this particular version of the card from other versions sold by Apple, significantly improving performance when handling 24-bits-per-pixel images. Floating point unit A floating-point unit ( FPU ), numeric processing unit ( NPU ), colloquially math coprocessor ,

6120-455: The page mask bits are not stored in the VPN2. Each TLB entry has its own page size, which can be any value from 1 KB to 256 MB in multiples of four. Each PFN in a TLB entry has a caching attribute, a dirty and a valid status bit. A VPN2 has a global status bit and an OS assigned ID which participates in the virtual address TLB entry match, if the global status bit is set to zero. A PFN stores

6210-400: The page size is dependent on the processor. pages. After a TLB miss, low-level firmware machine code (here called PALcode ) walks a page table. The OpenVMS AXP PALcode and DEC OSF/1 PALcode walk a three-level tree-structured page table. Addresses are broken down into an unused set of bits (containing the same value as the uppermost bit of the index into the root level of the tree),

6300-401: The physical address without the page mask bits. A TLB refill exception is generated when there are no entries in the TLB that match the mapped virtual address. A TLB invalid exception is generated when there is a match but the entry is marked invalid. A TLB modified exception is generated when a store instruction references a mapped address and the matching entry's dirty status is not set. If

6390-414: The physical base address and length of the segment, contains entries giving the physical base address of a page table for the segment, in addition to the length of the segment. Physical memory is divided into fixed-size pages, and the same techniques used for purely page-based demand paging are used for segment-and-page-based demand paging. Another approach to memory handling is to break up main memory into

6480-467: The possible range, but the page tables for P0 and P1 space are stored in the paged S0 space. Thus, there is effectively a two-level tree , allowing applications to have sparse memory layout without wasting a lot of space on unused page table entries. Unlike page table entries in most MMUs, page table entries in the VAX MMU lack an accessed bit . OSes which implement paging must find some way to emulate

6570-582: The predecode information held of the cached instructions. AMD claimed that the superscalar 29K would have only a slightly lower performance than the K5, but much lower cost due to the size difference. The Honeywell 29KII is a CPU based on the AMD 29050, and it was extensively used in real-time avionics. Positioned as a product for "medium- to high-performance embedded applications" with potential for use in Unix workstations,

6660-416: The problem of external fragmentation of memory. After blocks of memory have been allocated and freed, the free memory may become fragmented (discontinuous) so that the largest contiguous block of free memory may be much smaller than the total amount. With virtual memory, a contiguous range of virtual addresses can be mapped to several non-contiguous blocks of physical memory; this non-contiguous allocation

6750-549: The procedure returns. Values being returned from the routines would be placed in the "global page", the top eight registers in the SPARC (for instance). The competing early RISC design from Stanford University , the Stanford MIPS , also looked at this concept but decided that improved compilers could make more efficient use of general purpose registers than a hard-wired window. In the original Berkeley design, SPARC, and i960,

6840-425: The processor) into pages , each having a size which is a power of 2, usually a few kilobytes , but they may be much larger. Programs reference memory using the natural address size of the machine, typically 32 or 64-bits in modern systems. The bottom bits of the address (the offset within a page) are left unchanged. The upper address bits are the virtual page numbers. Most MMUs use an in-memory table of items called

6930-614: The remainder of the page is wasted. If many small allocations of this sort are made, memory can be used up even though much of it remains empty. In some early microprocessor designs, memory management was performed by a separate integrated circuit such as the VLSI Technology VI475 (1986), the Motorola 68851 (1984) used with the Motorola 68020 CPU in the Macintosh II , or the Z8010 and Z8015 (1985) used with

7020-443: The resulting TLB "miss" would cause the processor to trap to a software routine responsible for providing any appropriate mapping to physical memory. In contrast to the MIPS approach which employed a random register to select the TLB entry to be replaced upon a TLB miss event, the 29000 provided a dedicated lru (least recently used) register. Some products in the 29000 family provided only 16 TLB entries to be able to dedicate part of

7110-544: The same Berkeley RISC design that also led to the Sun SPARC , Intel i960 , ARM and RISC-V . One design element used in some of the Berkeley RISC-derived designs is the concept of register windows , a technique used to speed up procedure calls significantly. The idea is to use a large set of registers as a stack, loading local data into a set of registers during a call, and marking them "dead" when

7200-415: The same 128 registers for the procedure stack, but adding another 64 for global access. In comparison, the SPARC had 128 registers in total, and the global set was a standard window of eight. This change resulted in much better register use in the 29000 under a wide variety of workloads. The 29000 also extended the register window stack with an in-memory (and in theory, in-cache) stack. When the window filled

7290-588: The same cycle. The integer and floating-point sides each have an own write port to the registers. It contained 428,000 transistors on a 1-micron process with a 0.8-micron effective channel length and was available at 20, 25, 33, and 40 MHz. Later the Am29040 was released at 33, 40, and 50 MHz, being like the Am29030 except for featuring a 4 KB data cache, a multiplication unit, and a few other enhancements. The 119 mm Am29040 contained 1.2 million transistors on

7380-443: The same values in to the segment or page maps of different contexts. Additional contexts can be handled by treating the segment map as a context cache and replacing out-of-date contexts on a least-recently used basis. The context register makes no distinction between user and supervisor states. Interrupts and traps do not switch contexts, which requires that all valid interrupt vectors always be mapped in page 0 of context, as well as

7470-518: The segmented case, programs see its memory as a single contiguous block. There are two disadvantages to this approach. The first is that as the virtual address space expands, the amount of memory needed to hold the mapping increases as well. For instance, in the 68020 the addresses are 32-bits wide, meaning the segment number for the same 8 kB page size is now the upper 19 bits and the mapping table expands to 512 kB in size, far beyond what could be implemented in hardware for reasonable cost in

7560-707: The silicon to peripherals. To compensate, the maximum page size employed by a mapping was increased from 8 KB to 16 MB. The first Am29000 was released in 1988, including a built-in MMU but floating point support was offloaded to the Am29027 FPU . Units with failed MMU or Branch Target Cache were sold as the Am29005 . In 1991 the line was extended with the Am29030 and Am29035 , which included an 8 KB or 4 KB of instruction cache, respectively. By then

7650-483: The target market. It was expected to attain 100 MHz frequency on a 0.4-micron process. AMD used the unreleased 29K microarchitecture as the basis of the K5 series of x86 -compatible processors. The ALUs were carried over, as was the re-order buffer with a slight modification. The FPU was taken from the 29050, but extended to 80 bits precision . The K5 translated the x86 instructions into "RISC-OPs" upon decoding, aided by

7740-482: The two values, base and limit, need to be stored. Each entry corresponds to a block of memory used by a single program, and the translation is invisible to the program, which sees main memory starting at address zero and extending to some fixed value. The disadvantage of this approach is that it leads to an effect known as external fragmentation . This occurs when memory allocations are released but are non-contiguous. In this case, enough memory may be available to handle

7830-492: The valid supervisor stack. The Sun-2 workstations are similar; they are built around the Motorola 68010 microprocessor and have a similar memory management unit, with 2 KB pages and 32 KB segments. The context register has a 3-bit system context used in supervisor state and a 3-bit user context used in user state. The Sun-3 workstations, except for the Sun-3/80, Sun-3/460, Sun-3/470, and Sun-3/480, are built around

7920-399: The windows were fixed in size. A routine using only one local variable would still use up eight registers on the SPARC, wasting this expensive resource. It was here that the 29000 differed from these earlier designs, using a variable window size. In this example only two registers would be used, one for the local variable, another for the return address . It also added more registers, including

8010-529: Was also supported in software implementations; one example is Apple's MultiFinder , released in 1987 for the Macintosh platform. Each program was allocated an amount of memory that was pre-selected in the Finder and translation from virtual to physical was accomplished within the programs using handles . A more common example is the Intel 8088 used in the IBM PC . This implemented a very simple MMU inside

8100-475: Was widely used by microprocessor MMUs in the 1970s and early 80s, including the Signetics 68905 (which could operate in either mode). Both Signetics and Philips produced a version of the 68000 that combined the 68905 on the same physical chip, the 68070. Another use of this technique is to expand the size of the physical address when the virtual address is too small. For instance, the PDP-11 originally had

#626373