In computer architecture , 64-bit integers , memory addresses , or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers , address buses , or data buses of that size. A computer that uses such a processor is a 64-bit computer.
87-518: The Apple A7 is a 64-bit system on a chip (SoC) designed by Apple Inc. , part of the Apple silicon series. It first appeared in the iPhone 5S , which was announced on September 10, 2013, and the iPad Air and iPad Mini 2 , which were both announced on October 22, 2013. Apple states that it is up to twice as fast and has up to twice the graphics power compared to its predecessor, the Apple A6 . It
174-409: A 32-bit to a 64-bit architecture is a fundamental alteration, as most operating systems must be extensively modified to take advantage of the new architecture, because that software has to manage the actual memory addressing hardware. Other software must also be ported to use the new abilities; older 32-bit software may be supported either by virtue of the 64-bit instruction set being a superset of
261-446: A loop is best predicted with a special loop predictor. A conditional jump in the bottom of a loop that repeats N times will be taken N-1 times and then not taken once. If the conditional jump is placed at the top of the loop, it will be not taken N-1 times and then taken once. A conditional jump that goes many times one way and then the other way once is detected as having loop behavior. Such a conditional jump can be predicted easily with
348-613: A virtual machine of a 16- or 32-bit operating system to run 16-bit applications or use one of the alternatives for NTVDM . Mac OS X 10.4 "Tiger" and Mac OS X 10.5 "Leopard" had only a 32-bit kernel, but they can run 64-bit user-mode code on 64-bit processors. Mac OS X 10.6 "Snow Leopard" had both 32- and 64-bit kernels, and, on most Macs, used the 32-bit kernel even on 64-bit processors. This allowed those Macs to support 64-bit processes while still supporting 32-bit device drivers; although not 64-bit drivers and performance advantages that can come with them. Mac OS X 10.7 "Lion" ran with
435-725: A 16 MiB ( 16 × 1024 bytes ) address space. 32-bit superminicomputers , such as the DEC VAX , became common in the 1970s, and 32-bit microprocessors, such as the Motorola 68000 family and the 32-bit members of the x86 family starting with the Intel 80386 , appeared in the mid-1980s, making 32 bits something of a de facto consensus as a convenient register size. A 32-bit address register meant that 2 addresses, or 4 GB of random-access memory (RAM), could be referenced. When these architectures were devised, 4 GB of memory
522-423: A 32- or 64-bit Java virtual machine with no modification. The lengths and precision of all the built-in types, such as char , short , int , long , float , and double , and the types that can be used as array indices, are specified by the standard and are not dependent on the underlying architecture. Java programs that run on a 64-bit Java virtual machine have access to a larger address space. Speed
609-609: A 64-bit kernel on more Macs, and OS X 10.8 "Mountain Lion" and later macOS releases only have a 64-bit kernel. On systems with 64-bit processors, both the 32- and 64-bit macOS kernels can run 32-bit user-mode code, and all versions of macOS up to macOS Mojave (10.14) include 32-bit versions of libraries that 32-bit applications would use, so 32-bit user-mode software for macOS will run on those systems. The 32-bit versions of libraries have been removed by Apple in macOS Catalina (10.15). Linux and most other Unix-like operating systems, and
696-463: A conditional jump is taken every third time. The branch sequence is 001001001... In this case, entry number 00 in the pattern history table will go to state "strongly taken", indicating that after two zeroes comes a one. Entry number 01 will go to state "strongly not taken", indicating that after 01 comes a zero. The same is the case with entry number 10, while entry number 11 is never used because there are never two consecutive ones. The general rule for
783-408: A conditional jump will not be taken, so they always fetch the next sequential instruction. Only when the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential address. Both CPUs evaluate branches in the decode stage and have a single cycle instruction fetch. As a result, the branch target recurrence is two cycles long, and the machine always fetches
870-780: A driver for a 32-bit PCI device asking the device to DMA data into upper areas of a 64-bit machine's memory could not satisfy requests from the operating system to load data from the device to memory above the 4 gigabyte barrier, because the pointers for those addresses would not fit into the DMA registers of the device. This problem is solved by having the OS take the memory restrictions of the device into account when generating requests to drivers for DMA, or by using an input–output memory management unit (IOMMU). As of August 2023 , 64-bit architectures for which processors are being manufactured include: Most architectures of 64 bits that are derived from
957-438: A fall-back. In static prediction, all decisions are made at compile time, before the execution of the program. Dynamic branch prediction uses information about taken or not taken branches gathered at run-time to predict the outcome of a branch. Using a random or pseudorandom bit (a pure guess) would guarantee every branch a 50% correct prediction rate, which cannot be improved (or worsened) by reordering instructions. (With
SECTION 10
#17327810283571044-449: A generation of computers in which 64-bit processors are the norm. 64 bits is a word size that defines certain classes of computer architecture, buses, memory, and CPUs and, by extension, the software that runs on them. 64-bit CPUs have been used in supercomputers since the 1970s ( Cray-1 , 1975) and in reduced instruction set computers (RISC) based workstations and servers since the early 1990s. In 2003, 64-bit CPUs were introduced to
1131-484: A given process and can have implications for efficient processor cache use. Maintaining a partial 32-bit model is one way to handle this, and is in general reasonably effective. For example, the z/OS operating system takes this approach, requiring program code to reside in 31-bit address spaces (the high order bit is not used in address calculation on the underlying hardware platform) while data objects can optionally reside in 64-bit regions. Not all such applications require
1218-751: A large address space or manipulate 64-bit data items, so these applications do not benefit from these features. x86-based 64-bit systems sometimes lack equivalents of software that is written for 32-bit architectures. The most severe problem in Microsoft Windows is incompatible device drivers for obsolete hardware. Most 32-bit application software can run on a 64-bit operating system in a compatibility mode , also termed an emulation mode, e.g., Microsoft WoW64 Technology for IA-64 and AMD64. The 64-bit Windows Native Mode driver environment runs atop 64-bit NTDLL.DLL , which cannot call 32-bit Win32 subsystem code (often devices whose actual hardware function
1305-487: A local 4-bit history and a local pattern history table with 16 entries for each conditional jump. On the SPEC '89 benchmarks, very large local predictors saturate at 97.1% correct. A global branch predictor does not keep a separate history record for each conditional jump. Instead it keeps a shared history of all conditional jumps. The advantage of a shared history is that any correlation between different conditional jumps
1392-417: A more advanced branch predictor. The first time a conditional jump instruction is encountered, there is not much information to base a prediction on. But the branch predictor keeps records of whether branches are taken or not taken. When it encounters a conditional jump that has been seen several times before, then it can base the prediction on the history. The branch predictor may, for example, recognize that
1479-439: A next-line predictor points to aligned groups of 2, 4, or 8 instructions, the branch target will usually not be the first instruction fetched, and so the initial instructions fetched are wasted. Assuming for simplicity, a uniform distribution of branch targets, 0.5, 1.5, and 3.5 instructions fetched are discarded, respectively. Since the branch itself will generally not be the last instruction in an aligned group, instructions after
1566-466: A per-core L1 cache of 64 KB for data and 64 KB for instructions, a L2 cache of 1 MB shared by both CPU cores, and a 4 MB L3 cache that services the entire SoC. The A7 includes a new image processor , a feature originally introduced in the A5 , used for functionality related to the camera such as image stabilizing, color correction, and light balance. The A7 also includes an area called
1653-521: A problem. 64-bit drivers were not provided for many older devices, which could consequently not be used in 64-bit systems. Driver compatibility was less of a problem with open-source drivers, as 32-bit ones could be modified for 64-bit use. Support for hardware made before early 2007, was problematic for open-source platforms, due to the relatively small number of users. 64-bit versions of Windows cannot run 16-bit software . However, most 32-bit applications will work well. 64-bit users are forced to install
1740-413: A processor with 64-bit memory addresses can directly access 2 bytes (16 exabytes or EB) of byte-addressable memory. With no further qualification, a 64-bit computer architecture generally has integer and addressing registers that are 64 bits wide, allowing direct support for 64-bit data types and addresses. However, a CPU might have external data buses or address buses with different sizes from
1827-447: A simple counter. A loop predictor is part of a hybrid predictor where a meta-predictor detects whether the conditional jump has loop behavior. An indirect jump instruction can choose among more than two branches. Some processors have specialized indirect branch predictors. Newer processors from Intel and AMD can predict indirect branches by using a two-level adaptive predictor. This kind of instruction contributes more than one bit to
SECTION 20
#17327810283571914-405: A single integer register can store the memory address to any location in the computer's physical or virtual memory . Therefore, the total number of addresses to memory is often determined by the width of these registers. The IBM System/360 of the 1960s was an early 32-bit computer; it had 32-bit integer registers, although it only used the low order 24 bits of a word for addresses, resulting in
2001-621: A two-level adaptive predictor with an n-bit history is that it can predict any repetitive sequence with any period if all n-bit sub-sequences are different. The advantage of the two-level adaptive predictor is that it can quickly learn to predict an arbitrary repetitive pattern. This method was invented by T.-Y. Yeh and Yale Patt at the University of Michigan . Since the initial publication in 1991, this method has become very popular. Variants of this prediction method are used in most modern microprocessors. A two-level branch predictor where
2088-425: A unique counter. The predictor table is indexed with the instruction address bits, so that the processor can fetch a prediction for every instruction before the instruction is decoded. The Two-Level Branch Predictor, also referred to as Correlation-Based Branch Predictor, uses a two-dimensional table of counters, also called "Pattern History Table". The table entries are two-bit counters. If an if statement
2175-522: Is an abbreviation of "Long, Pointer, 64". Other models are the ILP64 data model in which all three data types are 64 bits wide, and even the SILP64 model where short integers are also 64 bits wide. However, in most cases the modifications required are relatively minor and straightforward, and many well-written programs can simply be recompiled for the new environment with no changes. Another alternative
2262-399: Is beneficial to arrange that each predictor will have different aliasing patterns, so that it is more likely that at least one predictor will have no aliasing. Combined predictors with different indexing functions for the different predictors are called gskew predictors, and are analogous to skewed associative caches used for data and instruction caching. A conditional jump that controls
2349-483: Is emulated in user mode software, like Winprinters). Because 64-bit drivers for most devices were unavailable until early 2007 (Vista x64), using a 64-bit version of Windows was considered a challenge. However, the trend has since moved toward 64-bit computing, more so as memory prices dropped and the use of more than 4 GB of RAM increased. Most manufacturers started to provide both 32-bit and 64-bit drivers for new devices, so unavailability of 64-bit drivers ceased to be
2436-416: Is executed three times, the decision made on the third execution might depend upon whether the previous two were taken or not. In such scenarios, a two-level adaptive predictor works more efficiently than a saturation counter. Conditional jumps that are taken every second time or have some other regularly recurring pattern are not predicted well by the saturating counter. A two-level adaptive predictor remembers
2523-480: Is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline . Branch predictors play a critical role in achieving high performance in many modern pipelined microprocessor architectures. Two-way branching is usually implemented with a conditional jump instruction. A conditional jump can either be "taken" and jump to a different place in program memory, or it can be "not taken" and continue execution immediately after
2610-413: Is not the only factor to consider in comparing 32-bit and 64-bit processors. Applications such as multi-tasking, stress testing, and clustering – for high-performance computing (HPC) – may be more suited to a 64-bit architecture when deployed appropriately. For this reason, 64-bit clusters have been widely deployed in large organizations, such as IBM, HP, and Microsoft. Summary: A common misconception
2697-441: Is often written with implicit assumptions about the widths of data types. C code should prefer ( u ) intptr_t instead of long when casting pointers into integer objects. A programming model is a choice made to suit a given compiler, and several can coexist on the same OS. However, the programming model chosen as the primary model for the OS application programming interface (API) typically dominates. Another consideration
Apple A7 - Misplaced Pages Continue
2784-504: Is often, but not always, based on 64-bit units of data. For example, although the x86 / x87 architecture has instructions able to load and store 64-bit (and 32-bit) floating-point values in memory, the internal floating-point data and register format is 80 bits wide, while the general-purpose registers are 32 bits wide. In contrast, the 64-bit Alpha family uses a 64-bit floating-point data and register format, and 64-bit integer registers. Many computer instruction sets are designed so that
2871-417: Is part of making the predictions. The disadvantage is that the history is diluted by irrelevant information if the different conditional jumps are uncorrelated, and that the history buffer may not include any bits from the same branch if there are many other branches in between. It may use a two-level adaptive predictor. This scheme is better than the saturating counter scheme only for large table sizes, and it
2958-424: Is rarely as good as local prediction. The history buffer must be longer in order to make a good prediction. The size of the pattern history table grows exponentially with the size of the history buffer. Hence, the big pattern history table must be shared among all conditional jumps. A two-level adaptive predictor with globally shared history buffer and pattern history table is called a "gshare" predictor if it xors
3045-418: Is that 64-bit architectures are no better than 32-bit architectures unless the computer has more than 4 GB of random-access memory . This is not entirely true: The main disadvantage of 64-bit architectures is that, relative to 32-bit architectures, the same data occupies more space in memory (due to longer pointers and possibly other types, and alignment padding). This increases the memory requirements of
3132-495: Is the IBM AS/400 , software for which is compiled into a virtual instruction set architecture (ISA) called Technology Independent Machine Interface (TIMI); TIMI code is then translated to native machine code by low-level software before being executed. The translation software is all that must be rewritten to move the full OS and all software to a new platform, as when IBM transitioned the native instruction set for AS/400 from
3219-507: Is the LLP64 model, which maintains compatibility with 32-bit code by leaving both int and long as 32-bit. LL refers to the long long integer type, which is at least 64 bits on all platforms, including 32-bit environments. There are also systems with 64-bit processors using an ILP32 data model, with the addition of 64-bit long long integers; this is also used on many platforms with 32-bit processors. This model reduces code size and
3306-400: Is the data model used for device drivers . Drivers make up the majority of the operating system code in most modern operating systems (although many may not be loaded when the operating system is running). Many drivers use pointers heavily to manipulate data, and in some cases have to load pointers of a certain size into the hardware they support for direct memory access (DMA). As an example,
3393-489: Is the first 64-bit SoC to ship in a consumer smartphone or tablet computer . On March 21, 2017, the iPad mini 2 was discontinued, ending production of A7 chips. The latest software update for systems using this chip was iOS 12.5.7 , released on January 23, 2023, as they were discontinued with the release of iOS 13 and iPadOS 13 in 2019. The A7 features an Apple-designed 64-bit 1.3–1.4 GHz ARMv8-A dual-core CPU , called Cyclone . The 64-bit A64 instruction set in
3480-649: The Apple Watch Series 4 and 5. Many 64-bit platforms today use an LP64 model (including Solaris, AIX , HP-UX , Linux, macOS, BSD, and IBM z/OS). Microsoft Windows uses an LLP64 model. The disadvantage of the LP64 model is that storing a long into an int truncates. On the other hand, converting a pointer to a long will "work" in LP64. In the LLP64 model, the reverse is true. These are not problems which affect fully standard-compliant code, but code
3567-513: The C and C++ toolchains for them, have supported 64-bit processors for many years. Many applications and libraries for those platforms are open-source software , written in C and C++, so that if they are 64-bit-safe, they can be compiled into 64-bit versions. This source-based distribution model, with an emphasis on frequent releases, makes availability of application software for those operating systems less of an issue. In 32-bit programs, pointers and data types such as integers generally have
Apple A7 - Misplaced Pages Continue
3654-464: The Cray-1 , used registers up to 64 bits wide, and supported 64-bit integer arithmetic, although they did not support 64-bit addressing. In the mid-1980s, Intel i860 development began culminating in a 1989 release; the i860 had 32-bit integer registers and 32-bit addressing, so it was not a fully 64-bit processor, although its graphics unit supported 64-bit integer arithmetic. However, 32 bits remained
3741-912: The Nintendo 64 and the PlayStation 2 had 64-bit microprocessors before their introduction in personal computers. High-end printers, network equipment, and industrial computers also used 64-bit microprocessors, such as the Quantum Effect Devices R5000 . 64-bit computing started to trickle down to the personal computer desktop from 2003 onward, when some models in Apple 's Macintosh lines switched to PowerPC 970 processors (termed G5 by Apple), and Advanced Micro Devices (AMD) released its first 64-bit x86-64 processor. Physical memory eventually caught up with 32 bit limits. In 2023, laptop computers were commonly equipped with 16GB and servers up to 64 GB of memory, greatly exceeding
3828-504: The VIA Nano processor may be using this technique. An agree predictor is a two-level adaptive predictor with globally shared history buffer and pattern history table, and an additional local saturating counter. The outputs of the local and the global predictors are XORed with each other to give the final prediction. The purpose is to reduce contentions in the pattern history table where two branches with opposite prediction happen to share
3915-509: The iPad Air . Its die is identical in size and layout to that of the first A7 and is manufactured by Samsung. However, unlike the first version of the A7, the A7 used in the iPad Air is not on a PoP, having no stacked RAM. Instead it uses a chip-on-board mounting, immediately adjacent DRAM, and is covered by a metallic heat spreader, similar to the Apple A5X and A6X . The A7's branch predictor
4002-616: The "Secure Enclave" that stores and protects the data from the Touch ID fingerprint sensor on the iPhone 5S and iPad mini 3 . It has been speculated that the security of the data in the Secure Enclave is enforced by ARM's TrustZone / SecurCore technology. In a change from the Apple A6 , the A7 SoC no longer services the accelerometer, gyroscope and compass. In order to reduce power consumption, this functionality has been moved to
4089-400: The 2 = 4 possible branch histories, and each entry in the table contains a two-bit saturating counter of the same type as in figure 2 for each branch. The branch history register is used for choosing which of the four saturating counters to use. If the history is 00, then the first counter is used; if the history is 11, then the last of the four counters is used. Assume, for example, that
4176-521: The 32-bit instruction set, so that processors that support the 64-bit instruction set can also run code for the 32-bit instruction set, or through software emulation , or by the actual implementation of a 32-bit processor core within the 64-bit processor, as with some Itanium processors from Intel, which included an IA-32 processor core to run 32-bit x86 applications. The operating systems for those 64-bit architectures generally support both 32-bit and 64-bit applications. One significant exception to this
4263-833: The 32-bit limit of 4 GB ( 4 × 1024 bytes ), allowing room for later expansion and incurring no overhead of translating full 64-bit addresses. The Power ISA v3.0 allows 64 bits for an effective address, mapped to a segmented address with between 65 and 78 bits allowed, for virtual memory, and, for any given processor, up to 60 bits for physical memory. The Oracle SPARC Architecture 2015 allows 64 bits for virtual memory and, for any given processor, between 40 and 56 bits for physical memory. The ARM AArch64 Virtual Memory System Architecture allows 48 bits for virtual memory and, for any given processor, from 32 to 48 bits for physical memory. The DEC Alpha specification requires minimum of 43 bits of virtual memory address space (8 TB) to be supported, and hardware need to check and trap if
4350-655: The 4 GB address capacity of 32 bits. In principle, a 64-bit microprocessor can address 16 EB ( 16 × 1024 = 2 = 18,446,744,073,709,551,616 bytes ) of memory. However, not all instruction sets, and not all processors implementing those instruction sets, support a full 64-bit virtual or physical address space. The x86-64 architecture (as of 2016 ) allows 48 bits for virtual memory and, for any given processor, up to 52 bits for physical memory. These limits allow memory sizes of 256 TB ( 256 × 1024 bytes ) and 4 PB ( 4 × 1024 bytes ), respectively. A PC cannot currently contain 4 petabytes of memory (due to
4437-525: The A7 drew 1100 mA during fixed point operations and 520 mA during floating point operations, while its predecessor, the A6 processor in the iPhone 5, drew 485 mA and 320 mA. It is manufactured in a package on package (PoP) together with 1 GB of LPDDR3 DRAM with a 64-bit wide memory interface onto the package. Apple uses the APL5698 variant of the A7 chip, running at 1.4 GHz, in
SECTION 50
#17327810283574524-510: The ARMv8-A architecture doubles the number of registers of the A7 compared to the ARMv7 architecture used in A6. It has 31 general purpose registers that are each 64-bits wide and 32 floating-point/ NEON registers that are each 128-bits wide. The A7 also integrates a graphics processing unit (GPU) which AnandTech believes to be a PowerVR G6430 in a four cluster configuration. The A7 has
4611-507: The SPEC'89 benchmarks, such a predictor is about as good as the local predictor. Predictors like gshare use multiple table entries to track the behavior of any particular branch. This multiplication of entries makes it much more likely that two branches will map to the same table entry (a situation called aliasing), which in turn makes it much more likely that prediction accuracy will suffer for those branches. Once you have multiple predictors, it
4698-572: The code to tell whether the static prediction should be taken or not taken. The Intel Pentium 4 accepts branch prediction hints, but this feature was abandoned in later Intel processors. Static prediction is used as a fall-back technique in some processors with dynamic branch prediction when dynamic predictors do not have sufficient information to use. Both the Motorola MPC7450 (G4e) and the Intel Pentium 4 use this technique as
4785-483: The conditional jump is taken more often than not, or that it is taken every second time. Branch prediction is not the same as branch target prediction . Branch prediction attempts to guess whether a conditional jump will be taken or not. Branch target prediction attempts to guess the target of a taken conditional or unconditional jump before it is computed by decoding and executing the instruction itself. Branch prediction and branch target prediction are often combined into
4872-410: The conditional jump. It is not known for certain whether a conditional jump will be taken or not taken until the condition has been calculated and the conditional jump has passed the execution stage in the instruction pipeline (see fig. 1). Without branch prediction, the processor would have to wait until the conditional jump instruction has passed the execute stage before the next instruction can enter
4959-409: The fetch stage in the pipeline. The branch predictor attempts to avoid this waste of time by trying to guess whether the conditional jump is most likely to be taken or not taken. The branch that is guessed to be the most likely is then fetched and speculatively executed . If it is later detected that the guess was wrong, then the speculatively executed or partially executed instructions are discarded and
5046-479: The global history and branch PC, and "gselect" if it concatenates them. Global branch prediction is used in AMD processors, and in Intel Pentium M , Core , Core 2 , and Silvermont -based Atom processors. An alloyed branch predictor combines the local and global prediction principles by concatenating local and global branch histories, possibly with some bits from the program counter as well. Tests indicate that
5133-411: The history buffer. The zEC12 and later z/Architecture processors from IBM support a BRANCH PREDICTION PRELOAD instruction that can preload the branch predictor entry for a given instruction with a branch target address constructed by adding the contents of a general-purpose register to an immediate displacement value. Processors without this mechanism will simply predict an indirect jump to go to
5220-520: The history of the last n occurrences of the branch and uses one saturating counter for each of the possible 2 history patterns. This method is illustrated in figure 3. Consider the example of n = 2. This means that the last two occurrences of the branch are stored in a two-bit shift register . This branch history register can have four different binary values, 00, 01, 10, and 11, where zero means "not taken" and one means "taken". A pattern history table contains four entries per branch, one for each of
5307-573: The instruction immediately after any taken branch. Both architectures define branch delay slots in order to utilize these fetched instructions. A more advanced form of static prediction presumes that backward branches will be taken and that forward branches will not. A backward branch is one that has a target address that is lower than its own address. This technique can help with prediction accuracy of loops, which are usually backward-pointing branches, and are taken more often than not taken. Some processors allow branch prediction hints to be inserted into
SECTION 60
#17327810283575394-445: The last outcome of the branch. This is the most simple version of dynamic branch predictor possible, although it is not very accurate. A 2-bit saturating counter is a state machine with four states: When a branch is evaluated, the corresponding state machine is updated. Branches evaluated as not taken change the state toward strongly not taken, and branches evaluated as taken change the state toward strongly taken. The advantage of
5481-631: The mainstream PC market in the form of x86-64 processors and the PowerPC G5 . A 64-bit register can hold any of 2 (over 18 quintillion or 1.8×10 ) different values. The range of integer values that can be stored in 64 bits depends on the integer representation used. With the two most common representations, the range is 0 through 18,446,744,073,709,551,615 (equal to 2 − 1) for representation as an ( unsigned ) binary number , and −9,223,372,036,854,775,808 (−2 ) through 9,223,372,036,854,775,807 (2 − 1) for representation as two's complement . Hence,
5568-542: The mid-1990s, HAL Computer Systems , Sun Microsystems , IBM , Silicon Graphics , and Hewlett-Packard had developed 64-bit architectures for their workstation and server systems. A notable exception to this trend were mainframes from IBM, which then used 32-bit data and 31-bit address sizes; the IBM mainframes did not include 64-bit processors until 2000. During the 1990s, several low-cost 64-bit microprocessors were used in consumer electronics and embedded applications. Notably,
5655-495: The new M7 motion coprocessor which appears to be a separate ARM-based microcontroller from NXP Semiconductors . Apple uses the APL0698 variant of the A7 chip, running at 1.3 GHz, in the iPhone 5S , iPad Mini 2 , and iPad Mini 3 . This A7 is manufactured by Samsung on a high-κ metal gate (HKMG) 28 nm process and the chip includes over 1 billion transistors on a die 102 mm in size. According to ABI Research
5742-409: The norm until the early 1990s, when the continual reductions in the cost of memory led to installations with amounts of RAM approaching 4 GB, and the use of virtual memory spaces exceeding the 4 GB ceiling became desirable for handling certain types of problems. In response, MIPS and DEC developed 64-bit microprocessor architectures, initially for high-end workstation and server machines. By
5829-763: The older 32/48-bit IMPI to the newer 64-bit PowerPC-AS , codenamed Amazon . The IMPI instruction set was quite different from even 32-bit PowerPC, so this transition was even bigger than moving a given instruction set from 32 to 64 bits. On 64-bit hardware with x86-64 architecture (AMD64), most 32-bit operating systems and applications can run with no compatibility issues. While the larger address space of 64-bit architectures makes working with large data sets in applications such as digital video , scientific computing, and large databases easier, there has been considerable debate on whether they or their 32-bit compatibility modes will be faster than comparably priced 32-bit systems for other tasks. A compiled Java program can run on
5916-428: The other types of registers cannot. The size of these registers therefore normally limits the amount of directly addressable memory, even if there are registers, such as floating-point registers, that are wider. Most high performance 32-bit and 64-bit processors (some notable exceptions are older or embedded ARM architecture (ARM) and 32-bit MIPS architecture (MIPS) CPUs) have integrated floating point hardware, which
6003-420: The physical size of the memory chips), but AMD envisioned large servers, shared memory clusters, and other uses of physical address space that might approach this in the foreseeable future. Thus the 52-bit physical address provides ample room for expansion while not incurring the cost of implementing full 64-bit physical addresses. Similarly, the 48-bit virtual address space was designed to provide 65,536 (2 ) times
6090-409: The pipeline starts over with the correct branch, incurring a delay. The time that is wasted in case of a branch misprediction is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20 clock cycles . As a result, making a pipeline longer increases the need for
6177-578: The registers, even larger (the 32-bit Pentium had a 64-bit data bus, for instance). Processor registers are typically divided into several groups: integer , floating-point , single instruction, multiple data (SIMD), control , and often special registers for address arithmetic which may have various uses and names such as address , index , or base registers . However, in modern designs, these functions are often performed by more general purpose integer registers. In most processors, only integer or address-registers can be used to address data in memory;
6264-547: The remaining unsupported bits are zero (to support compatibility on future processors). Alpha 21064 supported 43 bits of virtual memory address space (8 TB) and 34 bits of physical memory address space (16 GB). Alpha 21164 supported 43 bits of virtual memory address space (8 TB) and 40 bits of physical memory address space (1 TB). Alpha 21264 supported user-configurable 43 or 48 bits of virtual memory address space (8 TB or 256 TB) and 44 bits of physical memory address space (16 TB). A change from
6351-443: The return stack buffer is typically 4–16 entries. The trade-off between fast branch prediction and good branch prediction is sometimes dealt with by having two branch predictors. The first branch predictor is fast and simple. The second branch predictor, which is slower, more complicated, and with bigger tables, will override a possibly wrong prediction made by the first predictor. The Alpha 21264 and Alpha EV8 microprocessors used
6438-421: The same architecture of 32 bits can execute code written for the 32-bit versions natively, with no performance penalty. This kind of support is commonly called bi-arch support or more generally multi-arch support . Branch predictor In computer architecture , a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure ) will go before this
6525-428: The same circuitry. Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction. The early implementations of SPARC and MIPS (two of the first commercial RISC architectures) used single-direction static branch prediction: they always predict that
6612-442: The same entry in the pattern history table. A hybrid predictor, also called combined predictor, implements more than one prediction mechanism. The final prediction is based either on a meta-predictor that remembers which of the predictors has made the best predictions in the past, or a majority vote function based on an odd number of different predictors. Scott McFarling proposed combined branch prediction in his 1993 paper. On
6699-486: The same length. This is not necessarily true on 64-bit machines. Mixing data types in programming languages such as C and its descendants such as C++ and Objective-C may thus work on 32-bit implementations but not on 64-bit implementations. In many programming environments for C and C-derived languages on 64-bit machines, int variables are still 32 bits wide, but long integers and pointers are 64 bits wide. These are described as having an LP64 data model , which
6786-399: The same target as it did last time. A function will normally return to where it is called from. The return instruction is an indirect jump that reads its target address from the call stack . Many microprocessors have a separate prediction mechanism for return instructions. This mechanism is based on a so-called return stack buffer , which is a local mirror of the call stack. The size of
6873-483: The second level is replaced with a neural network has been proposed. A local branch predictor has a separate history buffer for each conditional jump instruction. It may use a two-level adaptive predictor. The history buffer is separate for each conditional jump instruction, while the pattern history table may be separate as well or it may be shared between all conditional jumps. The Intel Pentium MMX , Pentium II , and Pentium III have local branch predictors with
6960-445: The simplest static prediction of "assume take", compilers can reorder instructions to get better than 50% correct prediction.) Also, it would make timing [much more] nondeterministic. Some superscalar processors (MIPS R8000 , Alpha 21264 , and Alpha 21464 (EV8)) fetch each line of instructions with a pointer to the next line. This next-line predictor handles branch target prediction as well as branch direction prediction. When
7047-448: The size of data structures containing pointers, at the cost of a much smaller address space, a good choice for some embedded systems. For instruction sets such as x86 and ARM in which the 64-bit version of the instruction set has more registers than does the 32-bit version, it provides access to the additional registers without the space penalty. It is common in 64-bit RISC machines, explored in x86 as x32 ABI , and has recently been used in
7134-510: The software perspective, 64-bit computing means the use of machine code with 64-bit virtual memory addresses. However, not all 64-bit instruction sets support full 64-bit virtual memory addresses; x86-64 and AArch64 for example, support only 48 bits of virtual address, with the remaining 16 bits of the virtual address required to be all zeros (000...) or all ones (111...), and several 64-bit instruction sets support fewer than 64 bits of physical memory address. The term 64-bit also describes
7221-405: The taken branch (or its delay slot ) will be discarded. Once again, assuming a uniform distribution of branch instruction placements, 0.5, 1.5, and 3.5 instructions fetched are discarded. The discarded instructions at the branch and destination lines add up to nearly a complete fetch cycle, even for a single-cycle next-line predictor. A 1-bit saturating counter (essentially a flip-flop ) records
7308-484: The two-bit counter scheme over a one-bit scheme is that a conditional jump has to deviate twice from what it has done most in the past before the prediction changes. For example, a loop-closing conditional jump is mispredicted once rather than twice. The original, non-MMX Intel Pentium processor uses a saturating counter, though with an imperfect implementation. On the SPEC '89 benchmarks, very large bimodal predictors saturate at 93.5% correct, once every branch maps to
7395-544: Was claimed to infringe on a 1998 patent. On October 14, 2015, a district judge found Apple guilty of infringing U.S. patent US 5781752 , "Table based data speculation circuit for parallel processing computer", on the Apple A7 and A8 processors. The patent is owned by Wisconsin Alumni Research Foundation (WARF), a firm affiliated with the University of Wisconsin . On July 24, 2017, Apple
7482-692: Was ordered to pay WARF $ 506 million for patent infringement. Apple filed an appellate brief on October 26, 2017, with the U.S. Court of Appeals for the Federal Circuit , that argued that Apple did not infringe on the patent owned by the Wisconsin Alumni Research Foundation. On September 28, 2018, the ruling was overturned on appeal and the award thrown out by the U.S. Federal Circuit Court of Appeals. The patent expired in December 2016. 64-bit computing From
7569-416: Was so far beyond the typical amounts (4 MiB) in installations, that this was considered to be enough headroom for addressing. 4.29 billion addresses were considered an appropriate size to work with for another important reason: 4.29 billion integers are enough to assign unique references to most entities in applications like databases . Some supercomputer architectures of the 1970s and 1980s, such as
#356643