Misplaced Pages

MESI protocol

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The MESI protocol is an invalidate-based cache coherence protocol , and is one of the most common protocols that support write-back caches . It is also known as the Illinois protocol due to its development at the University of Illinois at Urbana-Champaign . Write back caches can save considerable bandwidth generally wasted on a write through cache . There is always a dirty state present in write-back caches that indicates that the data in the cache is different from that in the main memory. The Illinois Protocol requires a cache-to-cache transfer on a miss if the block resides in another cache. This protocol reduces the number of main memory transactions with respect to the MSI protocol . This marks a significant improvement in performance.

#463536

53-498: The letters in the acronym MESI represent four exclusive states that a cache line can be marked with (encoded using two additional bits ): For any given pair of caches, the permitted states of a given cache line are as follows: When the block is marked M (modified) or E (exclusive), the copies of the block in other Caches are marked as I (Invalid). The MESI protocol is defined by a finite-state machine that transitions from one state to another based on 2 stimuli. The first stimulus

106-454: A binit as an arbitrary information unit equivalent to some fixed but unspecified number of bits. MSI protocol In computing , the MSI protocol - a basic cache-coherence protocol - operates in multiprocessor systems. As with other cache coherency protocols, the letters of the protocol name identify the possible states in which a cache line can be. In MSI, each block contained inside

159-409: A byte or word , is referred to, it is usually specified by a number from 0 upwards corresponding to its position within the byte or word. However, 0 can refer to either the most or least significant bit depending on the context. Similar to torque and energy in physics; information-theoretic information and data storage size have the same dimensionality of units of measurement , but there

212-487: A CPU can be oblivious to the fact that a cache line in its cache is actually invalid, as the invalidation queue contains invalidations that have been received but haven't yet been applied. Note that, unlike the store buffer, the CPU can't scan the invalidation queue, as that CPU and the invalidation queue are physically located on opposite sides of the cache. As a result, memory barriers are required. A store barrier will flush

265-411: A cache can have one of three possible states: These coherency states are maintained through communication between the caches and the backing store. The caches have different responsibilities when blocks are read or written, or when they learn of other caches issuing reads or writes for a block. When a read request arrives at a cache for a block in the "M" or "S" states, the cache supplies the data. If

318-548: A clean state. But this is not a requirement and is just an additional overhead caused by using MESI. This challenge was overcome by the MOESI protocol . In case of S (Shared State), multiple snoopers may response with FlushOpt with the same data (see the example above). The F state in MESIF addresses this redundancy. Bit The bit is the most basic unit of information in computing and digital communication . The name

371-613: A conducting path at a certain point of a circuit. In optical discs , a bit is encoded as the presence or absence of a microscopic pit on a reflective surface. In one-dimensional bar codes , bits are encoded as the thickness of alternating black and white lines. The bit is not defined in the International System of Units (SI). However, the International Electrotechnical Commission issued standard IEC 60027 , which specifies that

424-431: A number of bytes which is a low power of two. A string of four bits is usually a nibble . In information theory , one bit is the information entropy of a random binary variable that is 0 or 1 with equal probability, or the information that is gained when the value of such a variable becomes known. As a unit of information , the bit is also known as a shannon , named after Claude E. Shannon . The symbol for

477-403: A particular cache block. If another cache has the block in the "M" state, it must write back the data to the backing store and go to the "S" or "I" states. Once any "M" line is written back, the cache obtains the block from either the backing store, or another cache with the data in the "S" state. The cache can then supply the data to the requester. After supplying the data, the cache block is in

530-468: Is a portmanteau of binary digit . The bit represents a logical state with one of two possible values . These values are most commonly represented as either " 1 " or " 0 " , but other representations such as true / false , yes / no , on / off , or + / − are also widely used. The relation between these values and the physical states of the underlying storage or device is a matter of convention, and different assignments may be used even within

583-414: Is a read operation with intent to write to that memory address. Therefore, this operation is exclusive. It brings data to the cache and invalidates all other processor caches that hold this memory line. This is termed "BusRdX" in tables above. MESI in its naive, straightforward implementation exhibits two particular performance issues. First, when writing to an invalid cache line, there is a long delay while

SECTION 10

#1732780822464

636-601: Is generally the case in bus based systems. ) Snooping Operation : In a snooping system, all caches on a bus monitor all the transactions on that bus. Every cache has a copy of the sharing status of every block of physical memory it has stored. The state of the block is changed according to the State Diagram of the protocol used. (Refer image above for MESI state diagram). The bus has snoopers on both sides: Explanation: Each Cache block has its own 4 state finite-state machine (refer image 1.1). The State transitions and

689-486: Is in general no meaning to adding, subtracting or otherwise combining the units mathematically, although one may act as a bound on the other. Units of information used in information theory include the shannon (Sh), the natural unit of information (nat) and the hartley (Hart). One shannon is the maximum amount of information needed to specify the state of one bit of storage. These are related by 1 Sh ≈ 0.693 nat ≈ 0.301 Hart. Some authors also define

742-425: Is in the "I" state, the cache must notify any other caches that might contain the block in the "S" or "M" states that they must evict the block. If the block is in another cache in the "M" state, that cache must either write the data to the backing store or supply it to the requesting cache. If at this point the cache does not yet have the block locally, the block is read from the backing store before being modified in

795-554: Is more compressed—the same bucket can hold more. For example, it is estimated that the combined technological capacity of the world to store information provides 1,300 exabytes of hardware digits. However, when this storage space is filled and the corresponding content is optimally compressed, this only represents 295 exabytes of information. When optimally compressed, the resulting carrying capacity approaches Shannon information or information entropy . Certain bitwise computer processor instructions (such as bit set ) operate at

848-462: Is requested on the bus. A Read For Ownership (RFO) is an operation in cache coherency protocols that combines a read and an invalidate broadcast. The operation is issued by a processor trying to write into a cache line that is in the shared (S) or invalid (I) states of the MESI protocol. The operation causes all other caches to set the state of such a line to I. A read for ownership transaction

901-420: Is the extra "exclusive" state present in the MESI protocol. This extra state was added as it has many advantages. When a processor needs to read a block that none of the other processors have and then write to it, two bus transactions will take place in the case of MSI. First, a BusRdX request is issued to read the block followed by a BusUpgr request before writing to the block. The BusRdX request in this scenario

954-426: Is the processor-specific Read and Write request. For example: A processor P1 has a Block X in its Cache, and there is a request from the processor to read or write from that block. The second stimulus is given through the bus connecting the processors. In particular, the "Bus side requests" come from other processors that don't have the cache block or the updated data in their Cache. The bus requests are monitored with

1007-494: Is the unit byte , coined by Werner Buchholz in June 1956, which historically was used to represent the group of bits used to encode a single character of text (until UTF-8 multibyte encoding took over) in a computer and for this reason it was used as the basic addressable element in many computer architectures . The trend in hardware design converged on the most common implementation of using eight bits per byte, as it

1060-427: Is useless as none of the other caches have the same block, but there is no way for one cache to know about this. Thus, MESI protocol overcomes this limitation by adding an Exclusive state, which results in saving a bus request. This makes a huge difference when a sequential application is running. As only one processor works on a piece of data, all the accesses will be exclusive. MSI performs much worse in this case due to

1113-452: Is widely used today. However, because of the ambiguity of relying on the underlying hardware design, the unit octet was defined to explicitly denote a sequence of eight bits. Computers usually manipulate bits in groups of a fixed size, conventionally named " words ". Like the byte, the number of bits in a word also varies with the hardware design, and is typically between 8 and 80 bits, or even more in some specialized computers. In

SECTION 20

#1732780822464

1166-410: The yottabit (Ybit). When the information capacity of a storage system or a communication channel is presented in bits or bits per second , this often refers to binary digits, which is a computer hardware capacity to store binary data ( 0 or 1 , up or down, current or not, etc.). Information capacity of a storage system is only an upper bound to the quantity of information stored therein. If

1219-408: The "S" state. When a write request arrives at a cache for a block in the "M" state, the cache modifies the data locally. If the block is in the "S" state, the cache must notify any other caches that might contain the block in the "S" state that they must evict the block. This notification may be via bus snooping or a directory, as described above. Then the data may be locally modified. If the block

1272-407: The 1950s and 1960s, these methods were largely supplanted by magnetic storage devices such as magnetic-core memory , magnetic tapes , drums , and disks , where a bit was represented by the polarity of magnetization of a certain area of a ferromagnetic film, or by a change in polarity from one direction to the other. The same principle was later used in the magnetic bubble memory developed in

1325-418: The 1980s, and is still found in various magnetic strip items such as metro tickets and some credit cards . In modern semiconductor memory , such as dynamic random-access memory , the two values of a bit may be represented by two levels of electric charge stored in a capacitor . In certain types of programmable logic arrays and read-only memory , a bit may be represented by the presence or absence of

1378-475: The Exclusive state is an opportunistic optimization: If the CPU wants to modify a cache line in state S, a bus transaction is necessary to invalidate all other cached copies. State E enables modifying a cache line with no bus transaction. Illustration of MESI protocol operations For example, let us assume that the following stream of read/write references. All the references are to the same location and

1431-511: The Modified state must snoop (intercept) all attempted reads (from all the other caches in the system) of the corresponding main memory location and insert the data that it holds. This can be done by forcing the read to back off (i.e. retry later), then writing the data to main memory and changing the cache line to the Shared state. It can also be done by sending data from Modified cache to

1484-424: The average. This principle is the basis of data compression technology. Using an analogy, the hardware binary digits refer to the amount of storage space available (like the number of buckets available to store things), and the information content the filling, which comes in different levels of granularity (fine or coarse, that is, compressed or uncompressed information). When the granularity is finer—when information

1537-679: The binary digit is either "bit", per the IEC 80000-13 :2008 standard, or the lowercase character "b", per the IEEE 1541-2002 standard. Use of the latter may create confusion with the capital "B" which is the international standard symbol for the byte. The encoding of data by discrete bits was used in the punched cards invented by Basile Bouchon and Jean-Baptiste Falcon (1732), developed by Joseph Marie Jacquard (1804), and later adopted by Semyon Korsakov , Charles Babbage , Herman Hollerith , and early computer manufacturers like IBM . A variant of that idea

1590-420: The block is not in the cache (in the "I" state), it must verify that the block is not in the "M" state in any other cache. Different caching architectures handle this differently. For example, bus architectures often perform snooping , where the read request is broadcast to all of the caches. Other architectures include cache directories which have agents (directories) that know which caches last had copies of

1643-434: The cache performing the read. Note, snooping only required for read misses (protocol ensures that Modified cannot exist if any other cache can perform a read hit). A cache that holds a line in the Shared state must listen for invalidate or request-for-ownership broadcasts from other caches, and discard the line (by moving it into Invalid state) on a match. The Modified and Exclusive states are always precise: i.e. they match

MESI protocol - Misplaced Pages Continue

1696-517: The cache. After the data is modified, the cache block is in the "M" state. For any given pair of caches, the permitted states of a given cache line are as follows: Processor requests to the cache include: In addition, there are bus side requests. These include: State Transitions: This protocol is similar to the one used in the SGI 4D machine. Modern systems use variants of the MSI protocol to reduce

1749-416: The digit refers to the processor issuing the reference. The stream is : R1, W1, R3, W3, R1, R3, R2. Initially it is assumed that all the caches are empty. Bus Request Note: The term snooping referred to below is a protocol for maintaining cache coherency in symmetric multiprocessing environments. All the caches on the bus monitor (snoop) the bus if they have a copy of the block of data that

1802-415: The early 21st century, retail personal or server computers have a word size of 32 or 64 bits. The International System of Units defines a series of decimal prefixes for multiples of standardized units which are commonly also used with the bit and the byte. The prefixes kilo (10 ) through yotta (10 ) increment by multiples of one thousand, and the corresponding units are the kilobit (kbit) through

1855-457: The electrical state of a flip-flop circuit. For devices using positive logic , a digit value of 1 (or a logical value of true) is represented by a more positive voltage relative to the representation of 0 . Different logic families require different voltages, and variations are allowed to account for component aging and noise immunity. For example, in transistor–transistor logic (TTL) and compatible circuits, digit values 0 and 1 at

1908-452: The extra bus messages. Even in the case of a highly parallel application with minimal sharing of data, MESI is far faster. Adding the Exclusive state also comes at no cost as 3 states and 4 states are both representable with 2 bits. In case continuous read and write operations are performed by various caches on a particular block, the data has to be flushed to the bus every time. Thus, the main memory will pull this on every flush and remain in

1961-421: The help of Snoopers , which monitor all the bus transactions. Following are the different types of Processor requests and Bus side requests: Processor Requests to Cache include the following operations: Bus side requests are the following: ( Such Cache to Cache transfers can reduce the read miss latency if the latency to bring the block from the main memory is more than from Cache to Cache transfers, which

2014-409: The level of manipulating bits rather than manipulating data interpreted as an aggregate of bits. In the 1980s, when bitmapped computer displays became popular, some computers provided specialized bit block transfer instructions to set or copy the bits that corresponded to a given rectangular area on the screen. In most computers and programming languages, when a bit within a group of bits, such as

2067-448: The line is fetched from other CPUs. Second, moving cache lines to the invalid state is time-consuming. To mitigate these delays, CPUs implement store buffers and invalidate queues. A store buffer is used when writing to an invalid cache line. As the write will proceed anyway, the CPU issues a read-invalid message (hence the cache line in question and all other CPUs' cache lines that store that memory address are invalidated) and then pushes

2120-473: The output of a device are represented by no higher than 0.4 V and no lower than 2.6 V, respectively; while TTL inputs are specified to recognize 0.8 V or below as 0 and 2.2 V or above as 1 . Bits are transmitted one at a time in serial transmission , and by a multiple number of bits in parallel transmission . A bitwise operation optionally processes bits one at a time. Data transfer rates are usually measured in decimal SI multiples of

2173-453: The responses at a particular state with respect to different inputs are shown in Table1.1 and Table 1.2 A write may only be performed freely if the cache line is in the Modified or Exclusive state. If it is in the Shared state, all other cached copies must be invalidated first. This is typically done by a broadcast operation known as Request For Ownership (RFO) . A cache that holds a line in

MESI protocol - Misplaced Pages Continue

2226-415: The same device or program . It may be physically implemented with a two-state device. A contiguous group of binary digits is commonly called a bit string , a bit vector, or a single-dimensional (or multi-dimensional) bit array . A group of eight bits is called one  byte , but historically the size of the byte is not strictly defined. Frequently, half, full, double and quadruple words consist of

2279-415: The states of electrical relays which could be either "open" or "closed". When relays were replaced by vacuum tubes , starting in the 1940s, computer builders experimented with a variety of storage methods, such as pressure pulses traveling down a mercury delay line , charges stored on the inside surface of a cathode-ray tube , or opaque spots printed on glass discs by photolithographic techniques. In

2332-414: The store buffer, ensuring all writes have been applied to that CPU's cache. A read barrier will flush the invalidation queue, thus ensuring that all writes by other CPUs become visible to the flushing CPU. Furthermore, memory management units do not scan the store buffer, causing similar problems. This effect is visible even in single threaded processors. The most striking difference between MESI and MSI

2385-554: The symbol for binary digit should be 'bit', and this should be used in all multiples, such as 'kbit', for kilobit. However, the lower-case letter 'b' is widely used as well and was recommended by the IEEE 1541 Standard (2002) . In contrast, the upper case letter 'B' is the standard and customary symbol for byte. Multiple bits may be expressed and represented in several ways. For convenience of representing commonly reoccurring groups of bits in information technology, several units of information have traditionally been used. The most common

2438-429: The true cache line ownership situation in the system. The Shared state may be imprecise: if another cache discards a Shared line, this cache may become the sole owner of that cache line, but it will not be promoted to Exclusive state. Other caches do not broadcast notices when they discard cache lines, and this cache could not use such notifications without maintaining a count of the number of shared copies. In that sense

2491-556: The two possible values of one bit of storage are not equally likely, that bit of storage contains less than one bit of information. If the value is completely predictable, then the reading of that value provides no information at all (zero entropic bits, because no resolution of uncertainty occurs and therefore no information is available). If a computer file that uses n  bits of storage contains only m  <  n  bits of information, then that information can in principle be encoded in about m  bits, at least on

2544-462: The two stable states of a flip-flop , two positions of an electrical switch , two distinct voltage or current levels allowed by a circuit , two distinct levels of light intensity , two directions of magnetization or polarization , the orientation of reversible double stranded DNA , etc. Bits can be implemented in several forms. In most modern computing devices, a bit is usually represented by an electrical voltage or current pulse, or by

2597-507: The unit bit per second (bit/s), such as kbit/s. In the earliest non-electronic information processing devices, such as Jacquard's loom or Babbage's Analytical Engine , a bit was often stored as the position of a mechanical lever or gear, or the presence or absence of a hole at a specific point of a paper card or tape . The first electrical devices for discrete logic (such as elevator and traffic light control circuits , telephone switches , and Konrad Zuse's computer) represented bits as

2650-477: The use of a logarithmic measure of information in 1928. Claude E. Shannon first used the word "bit" in his seminal 1948 paper " A Mathematical Theory of Communication ". He attributed its origin to John W. Tukey , who had written a Bell Labs memo on 9 January 1947 in which he contracted "binary information digit" to simply "bit". A bit can be stored by a digital device or other physical system that exists in either of two possible distinct states . These may be

2703-411: The write into the store buffer, to be executed when the cache line finally arrives in the cache. A direct consequence of the store buffer's existence is that when a CPU commits a write, that write is not immediately written in the cache. Therefore, whenever a CPU needs to read a cache line, it first scans its own store buffer for the existence of the same line, as there is a possibility that the same line

SECTION 50

#1732780822464

2756-469: Was the perforated paper tape . In all those systems, the medium (card or tape) conceptually carried an array of hole positions; each position could be either punched through or not, thus carrying one bit of information. The encoding of text by bits was also used in Morse code (1844) and early digital communications machines such as teletypes and stock ticker machines (1870). Ralph Hartley suggested

2809-668: Was written by the same CPU before but hasn't yet been written in the cache (the preceding write is still waiting in the store buffer). Note that while a CPU can read its own previous writes in its store buffer, other CPUs cannot see those writes until they are flushed to the cache - a CPU cannot scan the store buffer of other CPUs. With regard to invalidation messages, CPUs implement invalidate queues, whereby incoming invalidate requests are instantly acknowledged but not immediately acted upon. Instead, invalidation messages simply enter an invalidation queue and their processing occurs as soon as possible (but not necessarily instantly). Consequently,

#463536