The Platform Controller Hub ( PCH ) is a family of Intel 's single-chip chipsets , first introduced in 2009. It is the successor to the Intel Hub Architecture , which used two chips–a northbridge and southbridge , and first appeared in the Intel 5 Series .
128-458: The PCH controls certain data paths and support functions used in conjunction with Intel CPUs . These include clocking (the system clock ), Flexible Display Interface (FDI) and Direct Media Interface (DMI), although FDI is used only when the chipset is required to support a processor with integrated graphics . As such, I/O functions are reassigned between this new central hub and the CPU compared to
256-516: A communications subsystem to connect, control, direct and interface between these functional modules. An SoC must have at least one processor core , but typically an SoC has more than one core. Processor cores can be a microcontroller , microprocessor (μP), digital signal processor (DSP) or application-specific instruction set processor (ASIP) core. ASIPs have instruction sets that are customized for an application domain and designed to be more efficient than general-purpose instructions for
384-430: A computer program , such as arithmetic , logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs). The form, design , and implementation of CPUs have changed over time, but their fundamental operation remains almost unchanged. Principal components of
512-472: A graphics processing unit (GPU) – all on a single substrate or microchip. SoCs may contain digital and also analog , mixed-signal and often radio frequency signal processing functions (otherwise it may be considered on a discrete application processor). High-performance SoCs are often paired with dedicated and physically separate memory and secondary storage (such as LPDDR and eUFS or eMMC , respectively) chips that may be layered on top of
640-447: A memory hierarchy and cache hierarchy . In the mobile computing market, this is common, but in many low-power embedded microcontrollers, this is not necessary. Memory technologies for SoCs include read-only memory (ROM), random-access memory (RAM), Electrically Erasable Programmable ROM ( EEPROM ) and flash memory . As in other computer systems, RAM can be subdivided into relatively faster but more expensive static RAM (SRAM) and
768-660: A netlist describing the design as a physical circuit and its interconnections. These netlists are combined with the glue logic connecting the components to produce the schematic description of the SoC as a circuit which can be printed onto a chip. This process is known as place and route and precedes tape-out in the event that the SoCs are produced as application-specific integrated circuits (ASIC). SoCs must optimize power use , area on die , communication, positioning for locality between modular units and other factors. Optimization
896-399: A semiconductor foundry . This process is called functional verification and it accounts for a significant portion of the time and energy expended in the chip design life cycle , often quoted as 70%. With the growing complexity of chips, hardware verification languages like SystemVerilog , SystemC , e , and OpenVera are being used. Bugs found in the verification stage are reported to
1024-462: A CPU include the arithmetic–logic unit (ALU) that performs arithmetic and logic operations , processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that orchestrates the fetching (from memory) , decoding and execution (of instructions) by directing the coordinated operations of the ALU, registers, and other components. Modern CPUs devote
1152-486: A CPU may also contain memory , peripheral interfaces, and other components of a computer; such integrated devices are variously called microcontrollers or systems on a chip (SoC). Early computers such as the ENIAC had to be physically rewired to perform different tasks, which caused these machines to be called "fixed-program computers". The "central processing unit" term has been in use since as early as 1955. Since
1280-402: A cache had only one level of cache; unlike later level 1 caches, it was not split into L1d (for data) and L1i (for instructions). Almost all current CPUs with caches have a split L1 cache. They also have L2 caches and, for larger processors, L3 caches as well. The L2 cache is usually not split and acts as a common repository for the already split L1 cache. Every core of a multi-core processor has
1408-515: A certain level of computational performance , but power is limited in most SoC environments. SoC designs are optimized to minimize waste heat output on the chip. As with other integrated circuits , heat generated due to high power density are the bottleneck to further miniaturization of components. The power densities of high speed integrated circuits, particularly microprocessors and including SoCs, have become highly uneven. Too much waste heat can damage circuits and erode reliability of
SECTION 10
#17327803995391536-453: A chip or system-on-chip ( SoC / ˌ ˈ ɛ s oʊ s iː / ; pl. SoCs / ˌ ˈ ɛ s oʊ s iː z / ) is an integrated circuit that integrates most or all components of a computer or electronic system . These components usually include an on-chip central processing unit (CPU), memory interfaces, input/output devices and interfaces, and secondary storage interfaces, often alongside other components such as radio modems and
1664-527: A chip consists of both the hardware , described in § Structure , and the software controlling the microcontroller, microprocessor or digital signal processor cores, peripherals and interfaces. The design flow for an SoC aims to develop this hardware and software at the same time, also known as architectural co-design. The design flow must also take into account optimizations ( § Optimization goals ) and constraints. Most SoCs are developed from pre-qualified hardware component IP core specifications for
1792-1005: A circuit is the integral of power consumed with respect to time, and the average rate of power consumption is the product of current by voltage . Equivalently, by Ohm's law , power is current squared times resistance or voltage squared divided by resistance : P = I V = V 2 R = I 2 R {\displaystyle P=IV={\frac {V^{2}}{R}}={I^{2}}{R}} SoCs are frequently embedded in portable devices such as smartphones , GPS navigation devices , digital watches (including smartwatches ) and netbooks . Customers want long battery lives for mobile computing devices, another reason that power consumption must be minimized in SoCs. Multimedia applications are often executed on these devices, including video games, video streaming , image processing ; all of which have grown in computational complexity in recent years with user demands and expectations for higher- quality multimedia. Computation
1920-400: A code from the control unit indicating which operation to perform. Depending on the instruction being executed, the operands may come from internal CPU registers , external memory, or constants generated by the ALU itself. When all input signals have settled and propagated through the ALU circuitry, the result of the performed operation appears at the ALU's outputs. The result consists of both
2048-516: A comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers . Many modern CPUs have a die-integrated power managing module which regulates on-demand voltage supply to the CPU circuitry allowing it to keep balance between performance and power consumption. System on chip A system on
2176-412: A data word, which may be stored in a register or memory, and status information that is typically stored in a special, internal CPU register reserved for this purpose. Modern CPUs typically contain more than one ALU to improve performance. The address generation unit (AGU), sometimes also called the address computation unit (ACU), is an execution unit inside the CPU that calculates addresses used by
2304-458: A dedicated L2 cache and is usually not shared between the cores. The L3 cache, and higher-level caches, are shared between the cores and are not split. An L4 cache is currently uncommon, and is generally on dynamic random-access memory (DRAM), rather than on static random-access memory (SRAM), on a separate die or chip. That was also the case historically with L1, while bigger chips have allowed integration of it and generally all cache levels, with
2432-411: A design error had been discovered. Specifically, a transistor in the 3 Gbit/s PLL clocking tree was receiving too high voltage. The projected result was a 5–15% failure rate within three years of 3 Gbit/s SATA ports, commonly used for storage devices such as hard drives and optical drives. The bug was present in revision B2 of the chipsets, and was fixed with B3. Z68 did not have this bug, since
2560-866: A different processor. For further discussion of multi-processing memory issues, see cache coherence and memory latency . SoCs include external interfaces , typically for communication protocols . These are often based upon industry standards such as USB , Ethernet , USART , SPI , HDMI , I²C , CSI , etc. These interfaces will differ according to the intended application. Wireless networking protocols such as Wi-Fi , Bluetooth , 6LoWPAN and near-field communication may also be supported. When needed, SoCs include analog interfaces including analog-to-digital and digital-to-analog converters , often for signal processing . These may be able to interface with different types of sensors or actuators , including smart transducers . They may interface with application-specific modules or shields. Or they may be internal to
2688-554: A few of the remaining northbridge functions (e.g. clocking) in addition to all of the southbridge's functions, replacing it. The system clock was previously a connection to a dedicated chip but is now incorporated into the PCH. Two different connections exist between the PCH and the CPU: Flexible Display Interface (FDI) and Direct Media Interface (DMI). The FDI is used only when the chipset requires supporting
SECTION 20
#17327803995392816-756: A general trend towards tighter integration of components in the computer hardware industry , in part due to the influence of SoCs and lessons learned from the mobile and embedded computing markets. SoCs are very common in the mobile computing (as in smart devices such as smartphones and tablet computers ) and edge computing markets. In general, there are three distinguishable types of SoCs: SoCs can be applied to any computing task. However, they are typically used in mobile computing such as tablets, smartphones, smartwatches, and netbooks as well as embedded systems and in applications where previously microcontrollers would be used. Where previously only microcontrollers could be used, SoCs are rising to prominence in
2944-564: A global clock signal. Two notable examples of this are the ARM compliant AMULET and the MIPS R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous ALUs in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at
3072-460: A hundred or more gates, was to build them using a metal–oxide–semiconductor (MOS) semiconductor manufacturing process (either PMOS logic , NMOS logic , or CMOS logic). However, some companies continued to build processors out of bipolar transistor–transistor logic (TTL) chips because bipolar junction transistors were faster than MOS chips up until the 1970s (a few companies such as Datapoint continued to build processors out of TTL chips until
3200-522: A lot of semiconductor area to caches and instruction-level parallelism to increase performance and to CPU modes to support operating systems and virtualization . Most modern CPUs are implemented on integrated circuit (IC) microprocessors , with one or more CPUs on a single IC chip. Microprocessor chips with multiple CPUs are called multi-core processors . The individual physical CPUs, called processor cores , can also be multithreaded to support CPU-level multithreading. An IC that contains
3328-464: A manner independent of time scales, which are typically specified in HDL. Other components can remain software and be compiled and embedded onto soft-core processors included in the SoC as modules in HDL as IP cores . Once the architecture of the SoC has been defined, any new hardware elements are written in an abstract hardware description language termed register transfer level (RTL) which defines
3456-411: A memory management unit, translating logical addresses into physical RAM addresses, providing memory protection and paging abilities, useful for virtual memory . Simpler processors, especially microcontrollers , usually don't include an MMU. A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from
3584-538: A microcontroller integrates a microprocessor with peripheral circuits and memory, an SoC can be seen as integrating a microcontroller with even more advanced peripherals . Compared to a multi-chip architecture, an SoC with equivalent functionality will have reduced power consumption as well as a smaller semiconductor die area. This comes at the cost of reduced replaceability of components. By definition, SoC designs are fully or nearly fully integrated across different component modules . For these reasons, there has been
3712-459: A number that identifies the address of the next instruction to be fetched. After an instruction is fetched, the PC is incremented by the length of the instruction so that it will contain the address of the next instruction in the sequence. Often, the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue
3840-486: A power source while needing to maintain autonomous function, and often are limited in power use by a high number of embedded SoCs being networked together in an area. Additionally, energy costs can be high and conserving energy will reduce the total cost of ownership of the SoC. Finally, waste heat from high energy consumption can damage other circuit components if too much heat is dissipated, giving another pragmatic reason to conserve energy. The amount of energy used in
3968-543: A processor with integrated graphics. The Intel Management Engine was also moved to the PCH starting with the Nehalem processors and 5-Series chipsets. AMD's chipsets instead use several PCIe lanes to connect with the CPU while also providing their own PCIe lanes, which are also provided by the processor itself. The chipset also contains the Nonvolatile BIOS memory . With the northbridge functions integrated to
Platform Controller Hub - Misplaced Pages Continue
4096-495: A software integrated development environment . SoCs components are also often designed in high-level programming languages such as C++ , MATLAB or SystemC and converted to RTL designs through high-level synthesis (HLS) tools such as C to HDL or flow to HDL . HLS products called "algorithmic synthesis" allow designers to use C++ to model and synthesize system, circuit, software and verification levels all in one high level language commonly known to computer engineers in
4224-444: A solution to the bottleneck, several functions belonging to the traditional northbridge and southbridge chipsets were rearranged. The northbridge and its functions are now eliminated completely: The memory controller, PCI Express lanes for expansion cards and other northbridge functions are now incorporated into the CPU die as a system agent (Intel) or packaged in the processor on an I/O die (AMD Zen 2). The PCH then incorporates
4352-429: A specific type of workload. Multiprocessor SoCs have more than one processor core by definition. The ARM architecture is a common choice for SoC processor cores because some ARM-architecture cores are soft processors specified as IP cores . SoCs must have semiconductor memory blocks to perform their computation, as do microcontrollers and other embedded systems . Depending on the application, SoC memory may form
4480-554: A time. Some CPU architectures include multiple AGUs so more than one address-calculation operation can be executed simultaneously, which brings further performance improvements due to the superscalar nature of advanced CPU designs. For example, Intel incorporates multiple AGUs into its Sandy Bridge and Haswell microarchitectures , which increase bandwidth of the CPU memory subsystem by allowing multiple memory-access instructions to be executed in parallel. Many microprocessors (in smartphones and desktop, laptop, server computers) have
4608-446: A useful computer requires thousands or tens of thousands of switching devices. The overall speed of a system is dependent on the speed of the switches. Vacuum-tube computers such as EDVAC tended to average eight hours between failures, whereas relay computers—such as the slower but earlier Harvard Mark I —failed very rarely. In the end, tube-based CPUs became dominant because the significant speed advantages afforded generally outweighed
4736-439: A very small number of ICs; usually just one. The overall smaller CPU size, as a result of being implemented on a single die, means faster switching time because of physical factors like decreased gate parasitic capacitance . This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, the ability to construct exceedingly small transistors on an IC has increased
4864-400: Is defined by the CPU's instruction set architecture (ISA). Often, one group of bits (that is, a "field") within the instruction, called the opcode, indicates which operation is to be performed, while the remaining fields usually provide supplemental information required for the operation, such as the operands. Those operands may be specified as a constant value (called an immediate value), or as
4992-494: Is generally referred to as the " classic RISC pipeline ", which is quite common among the simple CPUs used in many electronic devices (often called microcontrollers). It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline. Some instructions manipulate the program counter rather than producing result data directly; such instructions are generally called "jumps" and facilitate program behavior like loops , conditional program execution (through
5120-483: Is greater or whether they are equal; one of these flags could then be used by a later jump instruction to determine program flow. Fetch involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The instruction's location (address) in program memory is determined by the program counter (PC; called the "instruction pointer" in Intel x86 microprocessors ), which stores
5248-400: Is largely addressed in modern processors by caches and pipeline architectures (see below). The instruction that the CPU fetches from memory determines what the CPU will do. In the decode step, performed by binary decoder circuitry known as the instruction decoder , the instruction is converted into signals that control other parts of the CPU. The way in which the instruction is interpreted
Platform Controller Hub - Misplaced Pages Continue
5376-492: Is more demanding as expectations move towards 3D video at high resolution with multiple standards , so SoCs performing multimedia tasks must be computationally capable platform while being low power to run off a standard mobile battery. SoCs are optimized to maximize power efficiency in performance per watt: maximize the performance of the SoC given a budget of power usage. Many applications such as edge computing , distributed processing and ambient intelligence require
5504-530: Is most often credited with the design of the stored-program computer because of his design of EDVAC, and the design became known as the von Neumann architecture , others before him, such as Konrad Zuse , had suggested and implemented similar ideas. The so-called Harvard architecture of the Harvard Mark I , which was completed before EDVAC, also used a stored-program design using punched paper tape rather than electronic memory. The key difference between
5632-780: Is necessarily a design goal of SoCs. If optimization was not necessary, the engineers would use a multi-chip module architecture without accounting for the area use, power consumption or performance of the system to the same extent. Common optimization targets for SoC designs follow, with explanations of each. In general, optimizing any of these quantities may be a hard combinatorial optimization problem, and can indeed be NP-hard fairly easily. Therefore, sophisticated optimization algorithms are often required and it may be practical to use approximation algorithms or heuristics in some cases. Additionally, most SoC designs contain multiple variables to optimize simultaneously , so Pareto efficient solutions are sought after in SoC design. Oftentimes
5760-737: Is the IBM PowerPC -based Xenon used in the Xbox 360 ; this reduces the power requirements of the Xbox 360. Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire asynchronous CPUs have been built without using
5888-619: Is the codename for the C620-series PCH, supporting LGA 2066 socketed Skylake-X / Kaby Lake-X processors (" Skylake-W " Xeon). Lewisburg has the following variations: Basin Falls is the codename for the C400-series PCH, supporting Skylake-X / Kaby Lake-X processors (branded Core i9 Extreme and " Skylake-W " Xeon). Generally similar to Wellsburg, Basin Falls consumes only up to 6 W when fully loaded. Basin Falls has
6016-686: Is the codename of a PCH in Intel 7 Series chipsets for server and workstation using the LGA 2011 socket. It was initially launched in 2011 as part of Intel X79 for the desktop enthusiast Sandy Bridge-E processors in Waimea Bay platforms. Patsburg was then used for the Sandy Bridge-EP server platform (the platform was codenamed Romley and the CPUs codenamed Jaketown, and finally branded as Xeon E5-2600 series) launched in early 2012. Launched in
6144-488: The IBM z13 has a 96 KiB L1 instruction cache. Most CPUs are synchronous circuits , which means they employ a clock signal to pace their sequential operations. The clock signal is produced by an external oscillator circuit that generates a consistent number of pulses each second in the form of a periodic square wave . The frequency of the clock pulses determines the rate at which a CPU executes instructions and, consequently,
6272-546: The Manchester Mark 1 ran its first program during the night of 16–17 June 1949. Early CPUs were custom designs used as part of a larger and sometimes distinctive computer. However, this method of designing custom CPUs for a particular application has largely given way to the development of multi-purpose processors produced in large quantities. This standardization began in the era of discrete transistor mainframes and minicomputers , and has rapidly accelerated with
6400-895: The bottlenecks of bus-based networks. Networks-on-chip have advantages including destination- and application-specific routing , greater power efficiency and reduced possibility of bus contention . Network-on-chip architectures take inspiration from communication protocols like TCP and the Internet protocol suite for on-chip communication, although they typically have fewer network layers . Optimal network-on-chip network architectures are an ongoing area of much research interest. NoC architectures range from traditional distributed computing network topologies such as torus , hypercube , meshes and tree networks to genetic algorithm scheduling to randomized algorithms such as random walks with branching and randomized time to live (TTL). Many SoC researchers consider NoC architectures to be
6528-474: The main memory . A cache is a smaller, faster memory, closer to a processor core , which stores copies of the data from frequently used main memory locations . Most CPUs have different independent caches, including instruction and data caches , where the data cache is usually organized as a hierarchy of more cache levels (L1, L2, L3, L4, etc.). All modern (fast) CPUs (with few specialized exceptions ) have multiple levels of CPU caches. The first CPUs that used
SECTION 50
#17327803995396656-453: The AGU, various address-generation calculations can be offloaded from the rest of the CPU, and can often be executed quickly in a single CPU cycle. Capabilities of an AGU depend on a particular CPU and its architecture . Thus, some AGUs implement and expose more address-calculation operations, while some also include more advanced specialized instructions that can operate on multiple operands at
6784-431: The ALU's output word size), an arithmetic overflow flag will be set, influencing the next operation. Hardwired into a CPU's circuitry is a set of basic operations it can perform, called an instruction set . Such operations may involve, for example, adding or subtracting two numbers, comparing two numbers, or jumping to a different part of a program. Each instruction is represented by a unique combination of bits , known as
6912-591: The B2 revision for it was never released. 6 Gbit/s ports were not affected. This bug was especially a problem with the H61 chipset, which only had 3 Gbit/s SATA ports. Through OEMs , Intel plans to repair or replace all affected products at a cost of $ 700 million. Nearly all produced motherboards using Cougar Point chipsets were designed to handle Sandy Bridge, and later Ivy Bridge, processors. ASRock produced one motherboard for LGA 1156 processors, based on P67 chipset,
7040-468: The CPU can fetch the data from actual memory locations. Those address-generation calculations involve different integer arithmetic operations , such as addition, subtraction, modulo operations , or bit shifts . Often, calculating a memory address involves more than one general-purpose machine instruction, which do not necessarily decode and execute quickly. By incorporating an AGU into a CPU design, together with introducing specialized instructions that use
7168-479: The CPU to access main memory . By having address calculations handled by separate circuitry that operates in parallel with the rest of the CPU, the number of CPU cycles required for executing various machine instructions can be reduced, bringing performance improvements. While performing various operations, CPUs need to calculate memory addresses required for fetching data from the memory; for example, in-memory positions of array elements must be calculated before
7296-422: The CPU to malfunction. Another major issue, as clock rates increase dramatically, is the amount of heat that is dissipated by the CPU . The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does energy consumption, causing
7424-467: The CPU to require more heat dissipation in the form of CPU cooling solutions. One method of dealing with the switching of unneeded components is called clock gating , which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs. One notable recent CPU design that uses extensive clock gating
7552-405: The CPU, much of the bandwidth needed for chipsets is now relieved. This style began in Nehalem and will remain for the foreseeable future, through Cannon Lake . Beginning with ultra-low-power Haswells and continuing with mobile Skylake processors, Intel incorporated the southbridge IO controllers into the CPU package, eliminating the PCH for a system in package (SOP) design with two dies;
7680-457: The FCH is replaced with a PCIe connection. Technically the processor can operate without a chipset; it only continues to be present for interfacing with low speed I/O. AMD server and laptop CPUs adopt a self contained system on chip (SoC) design instead which doesn't require a chipset. The Intel 5 Series chipsets were the first to introduce a PCH. This first PCH is codenamed Ibex Peak . This has
7808-492: The FPGA RTL that make signals available for observation. This is used to debug hardware, firmware and software interactions across multiple FPGAs with capabilities similar to a logic analyzer. In parallel, the hardware elements are grouped and passed through a process of logic synthesis , during which performance constraints, such as operational frequency and expected signal delays, are applied. This generates an output known as
SECTION 60
#17327803995397936-591: The P67 Transformer. It exclusively supports Lynnfield Core i5/i7 and Xeon processors, using LGA 1156 socket. After revision B2 of Cougar Point chipsets was recalled, ASRock decided not to update the P67 Transformer motherboard, and was discontinued. Some small Chinese manufacturers are producing LGA 1156 motherboards with H61 chipset. Whitney Point is the codename of a PCH in the Oak Trail tablet platform for Atom Lincroft microprocessors. This has
8064-820: The S3 state ( Suspend to RAM ), forcing the USB devices to be reconnected although no data is lost. This issue is corrected in C2 stepping level of the Lynx Point chipset. Wellsburg is the codename for the C610-series PCH, supporting the Haswell-E (Core i7 Extreme), Haswell-EP ( Xeon E5-16xx v3 and Xeon E5-26xx v3 ), and Broadwell-EP (Xeon E5-26xx v4) processors. Generally similar to Patsburg, Wellsburg consumes only up to 7 W when fully loaded. Wellsburg has
8192-401: The SoC in what is known as a package on package (PoP) configuration, or be placed close to the SoC. Additionally, SoCs may use separate wireless modems (especially WWAN modems). An SoC integrates a microcontroller , microprocessor or perhaps several processor cores with peripherals like a GPU , Wi-Fi and cellular network radio modems or one or more coprocessors . Similar to how
8320-737: The SoC, if needed. Popular time sources are crystal oscillators and phase-locked loops . SoC peripherals including counter -timers, real-time timers and power-on reset generators. SoCs also include voltage regulators and power management circuits. SoCs comprise many execution units . These units must often send data and instructions back and forth. Because of this, all but the most trivial SoCs require communications subsystems . Originally, as with other microcomputer technologies, data bus architectures were used, but recently designs based on sparse intercommunication networks known as networks-on-chip (NoC) have risen to prominence and are forecast to overtake bus architectures for SoC design in
8448-1228: The SoC, such as if an analog sensor is built in to the SoC and its readings must be converted to digital signals for mathematical processing. Digital signal processor (DSP) cores are often included on SoCs. They perform signal processing operations in SoCs for sensors , actuators , data collection , data analysis and multimedia processing. DSP cores typically feature very long instruction word (VLIW) and single instruction, multiple data (SIMD) instruction set architectures , and are therefore highly amenable to exploiting instruction-level parallelism through parallel processing and superscalar execution . SP cores most often feature application-specific instructions, and as such are typically application-specific instruction set processors (ASIP). Such application-specific instructions correspond to dedicated hardware functional units that compute those instructions. Typical DSP instructions include multiply-accumulate , Fast Fourier transform , fused multiply-add , and convolutions . As with other computer systems, SoCs require timing sources to generate clock signals , control execution of SoC functions and provide time context to signal processing applications of
8576-431: The advent and eventual success of the ubiquitous personal computer , the term CPU is now applied almost exclusively to microprocessors. Several CPUs (denoted cores ) can be combined in a single processing chip. Previous generations of CPUs were implemented as discrete components and numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on
8704-428: The advent of the transistor . Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements, like vacuum tubes and relays . With this improvement, more complex and reliable CPUs were built onto one or several printed circuit boards containing discrete (individual) components. In 1964, IBM introduced its IBM System/360 computer architecture that
8832-427: The circuit behavior, or synthesized into RTL from a high level language through high-level synthesis. These elements are connected together in a hardware description language to create the full SoC design. The logic specified to connect these components and convert between possibly different interfaces provided by different vendors is called glue logic . Chips are verified for validation correctness before being sent to
8960-435: The circuit over time. High temperatures and thermal stress negatively impact reliability, stress migration , decreased mean time between failures , electromigration , wire bonding , metastability and other performance degradation of the SoC over time. In particular, most SoCs are in a small physical area or volume and therefore the effects of waste heat are compounded because there is little room for it to diffuse out of
9088-564: The complexity and number of transistors in a single CPU many fold. This widely observed trend is described by Moore's law , which had proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity until 2016. While the complexity, size, construction and general form of CPUs have changed enormously since 1950, the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines. As Moore's law no longer holds, concerns have arisen about
9216-423: The complexity scale, a machine language program is a collection of machine language instructions that the CPU executes. The actual mathematical operation for each instruction is performed by a combinational logic circuit within the CPU's processor known as the arithmetic–logic unit or ALU. In general, a CPU executes an instruction by fetching it from memory, using its ALU to perform an operation, and then storing
9344-486: The control unit as part of the von Neumann architecture . In modern computer designs, the control unit is typically an internal part of the CPU with its overall role and operation unchanged since its introduction. The arithmetic logic unit (ALU) is a digital circuit within the processor that performs integer arithmetic and bitwise logic operations. The inputs to the ALU are the data words to be operated on (called operands ), status information from previous operations, and
9472-532: The data throughput of the SoC. This is similar to some device drivers of peripherals on component-based multi-chip module PC architectures. Wire delay is not scalable due to continued miniaturization , system performance does not scale with the number of cores attached, the SoC's operating frequency must decrease with each additional core attached for power to be sustainable, and long wires consume large amounts of electrical power. These challenges are prohibitive to supporting manycore systems on chip. In
9600-676: The designer. Traditionally, engineers have employed simulation acceleration, emulation or prototyping on reprogrammable hardware to verify and debug hardware and software for SoC designs prior to the finalization of the design, known as tape-out . Field-programmable gate arrays (FPGAs) are favored for prototyping SoCs because FPGA prototypes are reprogrammable, allow debugging and are more flexible than application-specific integrated circuits (ASICs). With high capacity and fast compilation time, simulation acceleration and emulation are powerful technologies that provide wide visibility into systems. Both technologies, however, operate slowly, on
9728-453: The desired operation. The action is then completed, typically in response to a clock pulse. Very often the results are written to an internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but less expensive and higher capacity main memory . For example, if an instruction that performs addition is to be executed, registers containing operands (numbers to be summed) are activated, as are
9856-429: The drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided to avoid delaying a single signal significantly enough to cause
9984-453: The early 1980s). In the 1960s, MOS ICs were slower and initially considered useful only in applications that required low power. Following the development of silicon-gate MOS technology by Federico Faggin at Fairchild Semiconductor in 1968, MOS ICs largely replaced bipolar TTL as the standard chip technology in the early 1970s. As the microelectronic technology advanced, an increasing number of transistors were placed on ICs, decreasing
10112-804: The embedded systems market. Tighter system integration offers better reliability and mean time between failure , and SoCs offer more advanced functionality and computing power than microcontrollers. Applications include AI acceleration , embedded machine vision , data collection , telemetry , vector processing and ambient intelligence . Often embedded SoCs target the internet of things , multimedia, networking, telecommunications and edge computing markets. Some examples of SoCs for embedded applications include: Mobile computing based SoCs always bundle processors, memories, on-chip caches , wireless networking capabilities and often digital camera hardware and firmware. With increasing memory sizes, high end SoCs will often have no memory and flash storage and instead,
10240-578: The era of specialized supercomputers like those made by Cray Inc and Fujitsu Ltd . During this period, a method of manufacturing many interconnected transistors in a compact space was developed. The integrated circuit (IC) allowed a large number of transistors to be manufactured on a single semiconductor -based die , or "chip". At first, only very basic non-specialized digital circuits such as NOR gates were miniaturized into ICs. CPUs based on these "building block" ICs are generally referred to as "small-scale integration" (SSI) devices. SSI ICs, such as
10368-484: The eventual problematic performance bottleneck between the processor and the motherboard . Under the Hub Architecture, a motherboard would have a two piece chipset consisting of a northbridge chip and a southbridge chip. Over time, the speed of CPUs kept increasing but the bandwidth of the front-side bus (FSB) (connection between the CPU and the motherboard) did not, resulting in a performance bottleneck. As
10496-503: The execution of an instruction, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter . If a jump instruction was executed, the program counter will be modified to contain the address of the instruction that was jumped to and program execution continues normally. In more complex CPUs, multiple instructions can be fetched, decoded and executed simultaneously. This section describes what
10624-565: The fall of 2013, the Ivy Bridge-E /EP processors (the latter branded as Xeon E5-2600 v2 series) also work with Patsburg, typically with a BIOS update. Patsburg has the following variations: Coleto Creek is the codename of the PCH most closely associated with Highland Forest platforms and Ivy Bridge-EP processors. Lynx Point is the codename of a PCH in Intel 8 Series chipsets , most closely associated with Haswell processors with LGA 1150 socket. The Lynx Point chipset connects to
10752-401: The faster the clock, the more instructions the CPU will execute each second. To ensure proper operation of the CPU, the clock period is longer than the maximum time needed for all signals to propagate (move) through the CPU. In setting the clock period to a value well above the worst-case propagation delay , it is possible to design the entire CPU and the way it moves data around the "edges" of
10880-457: The following variations: Cannon Point is the codename of a PCH in Intel 300 Series chipsets , most closely associated with Coffee Lake processors with LGA 1151 socket. The following variants are available: CPU A central processing unit ( CPU ), also called a central processor , main processor , or just processor , is the most important processor in a given computer . Its electronic circuitry executes instructions of
11008-678: The following variations: Langwell is the codename of a PCH in the Moorestown MID /smartphone platform. for Atom Lincroft microprocessors. This has the following variations: Tiger Point is the codename of a PCH in the Pine Trail netbook platform chipset for Atom Pineview microprocessors. This has the following variations: Topcliff is the codename of a PCH in the Queens Bay embedded platform chipset for Atom Tunnel Creek microprocessors. It connects to
11136-442: The following variations: Panther Point is the codename of a PCH in Intel 7 Series chipsets for mobile and desktop. It is most closely associated with Ivy Bridge processors. These chipsets (except PCH HM75) have integrated USB 3.0 . This has the following variations: Cave Creek is the codename of the PCH most closely associated with Crystal Forest platforms and Gladden or Sandy Bridge-EP/EN processors. Patsburg
11264-416: The following variations: Sunrise Point is the codename of a PCH in Intel 100 Series chipsets , most closely associated with Skylake processors with LGA 1151 socket. The following variants are available: Union Point is the codename of a PCH in Intel 200 Series chipsets , most closely associated with Kaby Lake processors with LGA 1151 socket. The following variants are available: Lewisburg
11392-425: The future of SoC design because they have been shown to efficiently meet power and throughput needs of SoC designs. Current NoC architectures are two-dimensional. 2D IC design has limited floorplanning choices as the number of cores in SoCs increase, so as three-dimensional integrated circuits (3DICs) emerge, SoC designers are looking towards building three-dimensional on-chip networks known as 3DNoCs. A system on
11520-512: The goals of optimizing some of these quantities are directly at odds, further adding complexity to design optimization of SoCs and introducing trade-offs in system design. For broader coverage of trade-offs and requirements analysis , see requirements engineering . SoCs are optimized to minimize the electrical power used to perform the SoC's functions. Most SoCs must use low power. SoC systems often require long battery life (such as smartphones ), can potentially spend months or years without
11648-431: The hardware elements and execution units , collectively "blocks", described above, together with software device drivers that may control their operation. Of particular importance are the protocol stacks that drive industry-standard interfaces like USB . The hardware blocks are put together using computer-aided design tools, specifically electronic design automation tools; the software modules are integrated using
11776-559: The individual transistors used by the PDP-8 and PDP-10 to SSI ICs, and their extremely popular PDP-11 line was originally built with SSI ICs, but was eventually implemented with LSI components once these became practical. Lee Boysel published influential articles, including a 1967 "manifesto", which described how to build the equivalent of a 32-bit mainframe computer from a relatively small number of large-scale integration circuits (LSI). The only way to build LSI chips, which are chips with
11904-446: The larger die being the CPU die, the smaller die being the PCH die. Rather than DMI , these SOPs directly expose PCIe lanes, as well as SATA, USB, and HDA lines from integrated controllers, and SPI/ I²C /UART/GPIO lines for sensors. Like PCH-compatible CPUs, they continue to expose DisplayPort, RAM, and SMBus lines. However, a fully integrated voltage regulator will be absent until Cannon Lake. AMD's FCH has been discontinued since
12032-482: The late 2010s, a trend of SoCs implementing communications subsystems in terms of a network-like topology instead of bus-based protocols has emerged. A trend towards more processor cores on SoCs has caused on-chip communication efficiency to become one of the key factors in determining the overall system performance and cost. This has led to the emergence of interconnection networks with router -based packet switching known as " networks on chip " (NoCs) to overcome
12160-439: The limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like electromigration and subthreshold leakage to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer , as well as to expand the use of parallelism and other methods that extend
12288-408: The location of a value that may be a processor register or a memory address, as determined by some addressing mode . In some CPU designs, the instruction decoder is implemented as a hardwired, unchangeable binary decoder circuit. In others, a microprogram is used to translate instructions into sets of CPU configuration signals that are applied sequentially over multiple clock pulses. In some cases
12416-406: The machine language opcode . While processing an instruction, the CPU decodes the opcode (via a binary decoder ) into control signals, which orchestrate the behavior of the CPU. A complete machine language instruction consists of an opcode and, in many cases, additional bits that specify arguments for the operation (for example, the numbers to be summed in the case of an addition operation). Going up
12544-598: The memory and flash memory will be placed right next to, or above ( package on package ), the SoC. Some examples of mobile computing SoCs include: In 1992, Acorn Computers produced the A3010, A3020 and A4000 range of personal computers with the ARM250 SoC. It combined the original Acorn ARM2 processor with a memory controller (MEMC), video controller (VIDC), and I/O controller (IOC). In previous Acorn ARM -powered computers, these were four discrete chips. The ARM7500 chip
12672-421: The memory that stores the microprogram is rewritable, making it possible to change the way in which the CPU decodes instructions. After the fetch and decode steps, the execute step is performed. Depending on the CPU architecture, this may consist of a single action or a sequence of actions. During each action, control signals electrically enable or disable various parts of the CPU so they can perform all or part of
12800-425: The near future. Historically, a shared global computer bus typically connected the different components, also called "blocks" of the SoC. A very common bus for SoC communications is ARM's royalty-free Advanced Microcontroller Bus Architecture ( AMBA ) standard. Direct memory access controllers route data directly between external interfaces and SoC memory, bypassing the CPU or control unit , thereby increasing
12928-710: The number of individual ICs needed for a complete CPU. MSI and LSI ICs increased transistor counts to hundreds, and then thousands. By 1968, the number of ICs required to build a complete CPU had been reduced to 24 ICs of eight different types, with each IC containing roughly 1000 MOSFETs. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits. Since microprocessors were first introduced they have almost completely overtaken all other central processing unit implementation methods. The first commercially available microprocessor, made in 1971,
13056-583: The ones used in the Apollo Guidance Computer , usually contained up to a few dozen transistors. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. IBM's System/370 , follow-on to the System/360, used SSI ICs rather than Solid Logic Technology discrete-transistor modules. DEC's PDP-8 /I and KI10 PDP-10 also switched from
13184-430: The order of MHz, which may be significantly slower – up to 100 times slower – than the SoC's operating frequency. Acceleration and emulation boxes are also very large and expensive at over US$ 1 million. FPGA prototypes, in contrast, use FPGAs directly to enable engineers to validate and test at, or close to, a system's full operating frequency with real-world stimuli. Tools such as Certus are used to insert probes in
13312-409: The parts of the arithmetic logic unit (ALU) that perform addition. When the clock pulse occurs, the operands flow from the source registers into the ALU, and the sum appears at its output. On subsequent clock pulses, other components are enabled (and disabled) to move the output (the sum of the operation) to storage (e.g., a register or memory). If the resulting sum is too large (i.e., it is larger than
13440-544: The physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the considerable time and effort required to reconfigure the computer to perform a new task. With von Neumann's design, the program that EDVAC ran could be changed simply by changing the contents of the memory. EDVAC was not the first stored-program computer; the Manchester Baby , which was a small-scale experimental stored-program computer, ran its first program on 21 June 1948 and
13568-501: The popularization of the integrated circuit (IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of nanometers . Both the miniaturization and standardization of CPUs have increased the presence of digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in electronic devices ranging from automobiles to cellphones, and sometimes even in toys. While von Neumann
13696-473: The possible exception of the last level. Each extra level of cache tends to be bigger and is optimized differently. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) that is part of the memory management unit (MMU) that most CPUs have. Caches are generally sized in powers of two: 2, 8, 16 etc. KiB or MiB (for larger non-L1) sizes, although
13824-626: The previous architecture: some northbridge functions, the memory controller and PCIe lanes, were integrated into the CPU while the PCH took over the remaining functions in addition to the traditional roles of the southbridge. AMD has its equivalent for the PCH, known simply as a chipset since the release of the Zen architecture in 2017. AMD no longer uses its equivalent for the PCH, the Fusion controller hub (FCH). The PCH architecture supersedes Intel's previous Hub Architecture , with its design addressing
13952-476: The processor primarily over the Direct Media Interface (DMI) interface. The following variants are available: In addition the following newer variants are available, additionally known as Wildcat Point , which also support Haswell Refresh processors: A design flaw causes devices connected to the Lynx Point's integrated USB 3.0 controller to be disconnected when the system wakes up from
14080-420: The processor via PCIe (vs. DMI as other PCHs do). This has the following variations: Cougar Point is the codename of a PCH in Intel 6 Series chipsets for mobile, desktop, and workstation / server platforms. It is most closely associated with Sandy Bridge processors. This has the following variations: In the first month after Cougar Point's release, January 2011, Intel posted a press release stating
14208-451: The processor. It tells the computer's memory, arithmetic and logic unit and input and output devices how to respond to the instructions that have been sent to the processor. It directs the operation of the other units by providing timing and control signals. Most computer resources are managed by the CU. It directs the flow of data between the CPU and the other devices. John von Neumann included
14336-561: The release of the Carrizo series of CPUs as it has been integrated into the same die as the rest of the CPU. However, since the release of the Zen architecture, there's still a component called a chipset which only handles relatively low speed I/O such as USB and SATA ports and connects to the CPU with a PCIe connection. In these systems all PCIe connections are routed directly to the CPU. The UMI interface previously used by AMD for communicating with
14464-478: The reliability problems. Most of these early synchronous CPUs ran at low clock rates compared to modern microelectronic designs. Clock signal frequencies ranging from 100 kHz to 4 MHz were very common at this time, limited largely by the speed of the switching devices they were built with. The design complexity of CPUs increased as various technologies facilitated the building of smaller and more reliable electronic devices. The first such improvement came with
14592-409: The result to memory. Besides the instructions for integer mathematics and logic operations, various other machine instructions exist, such as those for loading data from memory and storing it back, branching operations, and mathematical operations on floating-point numbers performed by the CPU's floating-point unit (FPU). The control unit (CU) is a component of the CPU that directs the operation of
14720-484: The rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism (see below). However, architectural improvements alone do not solve all of
14848-633: The risk of catastrophic failure . Due to increased transistor densities as length scales get smaller, each process generation produces more heat output than the last. Compounding this problem, SoC architectures are usually heterogeneous, creating spatially inhomogeneous heat fluxes , which cannot be effectively mitigated by uniform passive cooling . SoCs are optimized to maximize computational and communications throughput . SoCs are optimized to minimize latency for some or all of their functions. This can be accomplished by laying out elements with proper proximity and locality to each-other to minimize
14976-540: The short switching time of a transistor in comparison to a tube or relay. The increased reliability and dramatically increased speed of the switching elements, which were almost exclusively transistors by this time; CPU clock rates in the tens of megahertz were easily obtained during this period. Additionally, while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like single instruction, multiple data (SIMD) vector processors began to appear. These early experimental designs later gave rise to
15104-458: The slower but cheaper dynamic RAM (DRAM). When an SoC has a cache hierarchy, SRAM will usually be used to implement processor registers and cores' built-in caches whereas DRAM will be used for main memory . "Main memory" may be specific to a single processor (which can be multi-core ) when the SoC has multiple processors , in this case it is distributed memory and must be sent via § Intermodule communication on-chip to be accessed by
15232-420: The system. Because of high transistor counts on modern devices, oftentimes a layout of sufficient throughput and high transistor density is physically realizable from fabrication processes but would result in unacceptably high amounts of heat in the circuit's volume. These thermal effects force SoC and other chip designers to apply conservative design margins , creating less performant devices to mitigate
15360-439: The term "CPU" is generally defined as a device for software (computer program) execution, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer . The idea of a stored-program computer had been already present in the design of John Presper Eckert and John William Mauchly 's ENIAC , but was initially omitted so that it could be finished sooner. On June 30, 1945, before ENIAC
15488-422: The use of a conditional jump), and existence of functions . In some processors, some other instructions change the state of bits in a "flags" register . These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, in such processors a "compare" instruction evaluates two values and sets or clears bits in the flags register to indicate which one
15616-431: The usefulness of the classical von Neumann model. The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions that is called a program. The instructions to be executed are kept in some kind of computer memory . Nearly all CPUs follow the fetch, decode and execute steps in their operation, which are collectively known as the instruction cycle . After
15744-616: The von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily von Neumann in design, but CPUs with the Harvard architecture are seen as well, especially in embedded applications; for instance, the Atmel AVR microcontrollers are Harvard-architecture processors. Relays and vacuum tubes (thermionic tubes) were commonly used as switching elements;
15872-538: Was made, mathematician John von Neumann distributed a paper entitled First Draft of a Report on the EDVAC . It was the outline of a stored-program computer that would eventually be completed in August 1949. EDVAC was designed to perform a certain number of instructions (or operations) of various types. Significantly, the programs written for EDVAC were to be stored in high-speed computer memory rather than specified by
16000-647: Was so popular that it dominated the mainframe computer market for decades and left a legacy that is continued by similar modern computers like the IBM zSeries . In 1965, Digital Equipment Corporation (DEC) introduced another influential computer aimed at the scientific and research markets—the PDP-8 . Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of
16128-399: Was the Intel 4004 , and the first widely used microprocessor, made in 1974, was the Intel 8080 . Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older computer architectures , and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with
16256-713: Was their second-generation SoC, based on the ARM700, VIDC20 and IOMD controllers, and was widely licensed in embedded devices such as set-top-boxes, as well as later Acorn personal computers. Tablet and laptop manufacturers have learned lessons from embedded systems and smartphone markets about reduced power consumption, better performance and reliability from tighter integration of hardware and firmware modules , and LTE and other wireless network communications integrated on chip (integrated network interface controllers ). An SoC consists of hardware functional units , including microprocessors that run software code , as well as
16384-429: Was used in a series of computers capable of running the same programs with different speeds and performances. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM used the concept of a microprogram (often called "microcode"), which still sees widespread use in modern CPUs. The System/360 architecture
#538461