In compiler design, static single assignment form (often abbreviated as SSA form or simply SSA ) is a type of intermediate representation (IR) where each variable is assigned exactly once. SSA is used in most high-quality optimizing compilers for imperative languages, including LLVM , the GNU Compiler Collection , and many commercial compilers.
65-451: There are efficient algorithms for converting programs into SSA form. To convert to SSA, existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript, so that every definition gets its own version. Additional statements that assign to new versions of variables may also need to be introduced at the join point of two control flow paths. Converting from SSA form to machine code
130-417: A dead-code elimination problem. Then, a Φ function is live only if any use in the input program will be rewritten to it, or if it will be used as an argument in another Φ function. When entering SSA form, each use is rewritten to the nearest definition that dominates it. A Φ function will then be considered live as long as it is the nearest definition that dominates at least one use, or at least one argument of
195-409: A use-definition chain (or UD chain ) is a data structure that consists of a use U , of a variable , and all the definitions D of that variable that can reach that use without any other intervening definitions. A UD Chain generally means the assignment of some value to a variable. A counterpart of a UD Chain is a definition-use chain (or DU chain ), which consists of a definition D of
260-555: A Chaitin-style graph-coloring register allocator are: The graph-coloring allocation has three major drawbacks. First, it relies on graph-coloring, which is an NP-complete problem , to decide which variables are spilled. Finding a minimal coloring graph is indeed an NP-complete problem. Second, unless live-range splitting is used, evicted variables are spilled everywhere: store instructions are inserted as early as possible, i.e., just after variable definitions; load instructions are respectively inserted late, just before variable use. Third,
325-406: A block the block arguments are bound to specified values. MLton , Swift SIL, and LLVM MLIR use block arguments. SSA form is not normally used for direct execution (although it is possible to interpret SSA), and it is frequently used "on top of" another IR with which it remains in direct correspondence. This can be accomplished by "constructing" SSA as a set of functions that map between parts of
390-409: A concept called dominance frontiers (see below). Φ functions are not implemented as machine operations on most machines. A compiler can implement a Φ function by inserting "move" operations at the end of every predecessor block. In the example above, the compiler might insert a move from y 1 to y 3 at the end of the middle-left block and a move from y 2 to y 3 at the end of
455-446: A copy along each predecessor path that caused a source of different root symbol to be put in Φ than the destination of Φ. There are multiple algorithms for coming out of SSA with fewer copies, most use interference graphs or some approximation of it to do copy coalescing. Extensions to SSA form can be divided into two categories. Renaming scheme extensions alter the renaming criterion. Recall that SSA form renames each variable when it
520-448: A limited number of processor registers . Register allocation can happen over a basic block ( local register allocation ), over a whole function/ procedure ( global register allocation ), or across function boundaries traversed via call-graph ( interprocedural register allocation ). When done per function/procedure the calling convention may require insertion of save/restore around each call-site . In many programming languages ,
585-422: A live Φ. Semi-pruned SSA form is an attempt to reduce the number of Φ functions without incurring the relatively high cost of computing live-variable information. It is based on the following observation: if a variable is never live upon entry into a basic block, it never needs a Φ function. During SSA construction, Φ functions for any "block-local" variables are omitted. Computing the set of block-local variables
650-408: A node interferes with both nodes being coalesced, the degree of the node is reduced by one which leads to improving the overall colorability of the interference graph. There are several coalescing heuristics available: Some other register allocation approaches do not limit to one technique to optimize register's use. Cavazos et al., for instance, proposed a solution where it is possible to use both
715-547: A significant overhead , the used graph coloring algorithm having a quadratic cost. Owing to this feature, linear scan is the approach currently used in several JIT compilers, like the Hotspot client compiler , V8 , Jikes RVM , and the Android Runtime (ART). The Hotspot server compiler uses graph coloring for its superior code. This describes the algorithm as first proposed by Poletto et al., where: However,
SECTION 10
#1732765947517780-432: A simplification called future-active sets that successfully removed intervals for 80% of instructions. In the context of register allocation, coalescing is the act of merging variable-to-variable move operations by allocating those two variables to the same location. The coalescing operation takes place after the interference graph is built. Once two nodes have been coalesced, they must get the same color and be allocated to
845-638: A strong order among statements. For a variable, such as v , its declaration is identified as V (italic capital letter), and for short, its declaration is identified as s ( 0 ) {\displaystyle s(0)} . In general, a declaration of a variable can be in an outer scope (e.g., a global variable ). When a variable, v , is on the LHS of an assignment statement, such as s ( j ) {\displaystyle s(j)} , then s ( j ) {\displaystyle s(j)}
910-417: A variable and all the uses U reachable from that definition without any other intervening definitions. Both UD and DU chains are created by using a form of static code analysis known as data flow analysis . Knowing the use-def and def-use chains for a program or subprogram is a prerequisite for many compiler optimizations , including constant propagation and common subexpression elimination . Making
975-414: A variable is not live, the result of the Φ function cannot be used and the assignment by the Φ function is dead. Construction of pruned SSA form uses live-variable information in the Φ function insertion phase to decide whether a given Φ function is needed. If the original variable name isn't live at the Φ function insertion point, the Φ function isn't inserted. Another possibility is to treat pruning as
1040-424: A variable that is not spilled is kept in the same register throughout its whole lifetime. On the other hand, a single register name may appear in multiple register classes, where a class is a set of register names that are interchangeable in a particular role. Then, multiple register names may be aliases for a single hardware register. Finally, graph coloring is an aggressive technique for allocating registers, but
1105-471: Is a definition of v and it has a use at s ( j ) {\displaystyle s(j)} (or, in short, when a variable, v , is on the RHS of a statement s ( j ) {\displaystyle s(j)} , then v has a use at statement s ( j ) {\displaystyle s(j)} ). Consider the sequential execution of
1170-484: Is a definition of v . Every variable ( v ) has at least one definition by its declaration ( V ) (or initialization). If variable, v , is on the RHS of statement s ( j ) {\displaystyle s(j)} , there is a statement, s ( i ) {\displaystyle s(i)} with i < j and min ( j − i ) {\displaystyle \min(j-i)} , that it
1235-461: Is a simpler and faster procedure than full live-variable analysis, making semi-pruned SSA form more efficient to compute than pruned SSA form. On the other hand, semi-pruned SSA form will contain more Φ functions. Block arguments are an alternative to Φ functions that is representationally identical but in practice can be more convenient during optimization. Blocks are named and take a list of block arguments, notated as function parameters. When calling
1300-420: Is a technical term that means that more spills and reloads are needed; it is defined by Braun et al. as "the number of simultaneously live variables at an instruction". In addition, some computer designs cache frequently-accessed registers. So, programs can be further optimized by assigning the same register to a source and destination of a move instruction whenever possible. This is especially important if
1365-615: Is also efficient. SSA makes numerous analyses needed for optimizations easier to perform, such as determining use-define chains , because when looking at a use of a variable there is only one place where that variable may have received a value. Most optimizations can be adapted to preserve SSA form, so that one optimization can be performed after another with no additional analysis. The SSA based optimizations are usually more efficient and more powerful than their non-SSA form prior equivalents. In functional language compilers, such as those for Scheme and ML , continuation-passing style (CPS)
SECTION 20
#17327659475171430-747: Is assigned a value. Alternative schemes include static single use form (which renames each variable at each statement when it is used) and static single information form (which renames each variable when it is assigned a value, and at the post-dominance frontier). Feature-specific extensions retain the single assignment property for variables, but incorporate new semantics to model additional features. Some feature-specific extensions model high-level programming language features like arrays, objects and aliased pointers. Other feature-specific extensions model low-level architectural features like speculation and predication. Use-define chain Within computer science ,
1495-443: Is clear which definition each use is referring to, except for one case: both uses of y in the bottom block could be referring to either y 1 or y 2 , depending on which path the control flow took. To resolve this, a special statement is inserted in the last block, called a Φ (Phi) function . This statement will generate a new definition of y called y 3 by "choosing" either y 1 or y 2 , depending on
1560-506: Is computationally expensive due to its use of the interference graph, which can have a worst-case size that is quadratic in the number of live ranges. The traditional formulation of graph-coloring register allocation implicitly assumes a single bank of non-overlapping general-purpose registers and does not handle irregular architectural features like overlapping registers pairs, special purpose registers and multiple register banks. One later improvement of Chaitin-style graph-coloring approach
1625-403: Is generally used. SSA is formally equivalent to a well-behaved subset of CPS excluding non-local control flow, so optimizations and transformations formulated in terms of one generally apply to the other. Using CPS as the intermediate representation is more natural for higher-order functions and interprocedural analysis. CPS also easily encodes call/cc , whereas SSA does not. SSA was developed in
1690-458: Is ignored in favor of the most used branch. Each trace is then independently processed by the allocator. This approach can be considered as hybrid because it is possible to use different register allocation algorithms between the different traces. Split allocation is another register allocation technique that combines different approaches, usually considered as opposite. For instance, the hybrid allocation technique can be considered as split because
1755-493: Is in SSA form, both of these are immediate: Compiler optimization algorithms that are either enabled or strongly enhanced by the use of SSA include: Converting ordinary code into SSA form is primarily a matter of replacing the target of each assignment with a new variable, and replacing each use of a variable with the "version" of the variable reaching that point. For example, consider the following control-flow graph : Changing
1820-442: Is relevant. Line 7, for example, cannot be reached again. For your understanding, you can imagine 2 different variables d : As a result, you could get something like this. The variable d1 would be replaced by b With this algorithm, two things are accomplished: Register allocation In compiler optimization , register allocation is the process of assigning local automatic variables and expression results to
1885-421: Is the set of nodes B where A does not strictly dominate B, but does dominate some immediate predecessor of B. These are the points at which multiple control paths merge back together into a single path. For example, in the following code: Node 1 strictly dominates 2, 3, and 4 and the immediate predecessors of node 4 are nodes 2 and 3. Dominance frontiers define the points at which Φ functions are needed. In
1950-603: Is thought not to produce as optimized code as the "global" approach, which operates over the whole compilation unit (a method or procedure for instance). Graph-coloring allocation is the predominant approach to solve register allocation. It was first proposed by Chaitin et al. In this approach, nodes in the graph represent live ranges ( variables , temporaries , virtual/symbolic registers) that are candidates for register allocation. Edges connect live ranges that interfere, i.e., live ranges that are simultaneously live at at least one program point. Register allocation then reduces to
2015-507: Is to determine the duration for which a variable should stay at the same location. A register allocator, disregarding the chosen allocation strategy, can rely on a set of core actions to address these challenges. These actions can be gathered in several different categories: Many register allocation approaches optimize for one or more specific categories of actions. Register allocation raises several problems that can be tackled (or avoided) by different register allocation approaches. Three of
Static single-assignment form - Misplaced Pages Continue
2080-519: The SSA form : the properties of this intermediate representation simplify the allocation algorithm and allow lifetime holes to be computed directly. First, the time spent in data-flow graph analysis, aimed at building the lifetime intervals, is reduced, namely because variables are unique. It consequently produces shorter live intervals, because each new assignment corresponds to a new live interval. To avoid modeling intervals and liveness holes, Rogers showed
2145-466: The compiler is translating code to machine-language, it must decide how to allocate variables to the limited number of registers in the CPU. Not all variables are in use (or "live") at the same time, so, over the lifetime of a program, a given register may be used to hold different variables. However, two variables in use at the same time cannot be assigned to the same register without corrupting one of
2210-417: The graph coloring problem in which colors (registers) are assigned to the nodes such that two nodes connected by an edge do not receive the same color. Using liveness analysis , an interference graph can be built. The interference graph, which is an undirected graph where the nodes are the program's variables, is used to model which variables cannot be allocated to the same register. The main phases in
2275-496: The 1980s by several researchers at IBM . Kenneth Zadeck, a key member of the team, moved to Brown University as development continued. A 1986 paper introduced birthpoints, identity assignments, and variable renaming such that variables had a single static assignment. A subsequent 1987 paper by Jeanne Ferrante and Ronald Cytron proved that the renaming done in the previous paper removes all false dependencies for scalars. In 1988, Barry Rosen, Mark N. Wegman , and Kenneth Zadeck replaced
2340-497: The Control Graph" by Ron Cytron, Jeanne Ferrante, et al. in 1991. Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy of Rice University describe an algorithm in their paper titled A Simple, Fast Dominance Algorithm : In the code above, idom(b) is the immediate dominator of b, the unique node that strictly dominates b but does not strictly dominate any other node that strictly dominates b. "Minimal" SSA inserts
2405-444: The above example, when control is passed to node 4, the definition of result used depends on whether control was passed from node 2 or 3. Φ functions are not needed for variables defined in a dominator, as there is only one possible definition that can apply. There is an efficient algorithm for finding dominance frontiers of each node. This algorithm was originally described in "Efficiently Computing Static Single Assignment Form and
2470-407: The allocator can then choose between one of the two available algorithms. Trace register allocation is a recent approach developed by Eisl et al. This technique handles the allocation locally: it relies on dynamic profiling data to determine which branches will be the most frequently used in a given control flow graph. It then infers a set of "traces" (i.e. code segments) in which the merge point
2535-504: The compiler is using an intermediate representation such as static single-assignment form (SSA). In particular, when SSA is not fully optimized it can artificially generate additional move instructions. Register allocation consists therefore of choosing where to store the variables at runtime, i.e. inside or outside registers. If the variable is to be stored in registers, then the allocator needs to determine in which register(s) this variable will be stored. Eventually, another challenge
2600-510: The control flow in the past. Now, the last block can simply use y 3 , and the correct value will be obtained either way. A Φ function for x is not needed: only one version of x , namely x 2 is reaching this place, so there is no problem (in other words, Φ( x 2 , x 2 )= x 2 ). Given an arbitrary control-flow graph, it can be difficult to tell where to insert Φ functions, and for which variables. This general question has an efficient solution that can be computed using
2665-459: The existing IR (basic blocks, instructions, operands, etc. ) and its SSA counterpart. When the SSA form is no longer needed, these mapping functions may be discarded, leaving only the now-optimized IR. Performing optimizations on SSA form usually leads to entangled SSA-Webs, meaning there are Φ instructions whose operands do not all have the same root operand. In such cases color-out algorithms are used to come out of SSA. Naive algorithms introduce
Static single-assignment form - Misplaced Pages Continue
2730-415: The fewest Φ functions that are needed by a specific procedure. For some types of analysis, these Φ functions are superfluous and can cause the analysis to run less efficiently. Pruned SSA form is based on a simple observation: Φ functions are only needed for variables that are "live" after the Φ function. (Here, "live" means that the value is used along some path that begins at the Φ function in question.) If
2795-449: The first heuristic building stage is performed offline, and the heuristic use is performed online. In the same fashion, B. Diouf et al. proposed an allocation technique relying both on offline and online behaviors, namely static and dynamic compilation. During the offline stage, an optimal spill set is first gathered using Integer Linear Programming . Then, live ranges are annotated using the compressAnnotation algorithm which relies on
2860-555: The identity assignments with Φ-functions, introduced the name "static single-assignment form", and demonstrated a now-common SSA optimization. The name Φ-function was chosen by Rosen to be a more publishable version of "phony function". Alpern, Wegman, and Zadeck presented another optimization, but using the name "static single assignment". Finally, in 1989, Rosen, Wegman, Zadeck, Cytron, and Ferrante found an efficient means of converting programs to SSA form. The primary usefulness of SSA comes from how it simultaneously simplifies and improves
2925-445: The intervals are traversed chronologically. Although this traversal could help identifying variables whose live ranges interfere, no interference graph is being built and the variables are allocated in a greedy way. The motivation for this approach is speed; not in terms of execution time of the generated code, but in terms of time spent in code generation. Typically, the standard graph coloring approaches produce quality code, but have
2990-399: The linear scan and the graph coloring algorithms. In this approach, the choice between one or the other solution is determined dynamically: first, a machine learning algorithm is used "offline", that is to say not at runtime, to build a heuristic function that determines which allocation algorithm needs to be used. The heuristic function is then used at runtime; in light of the code behavior,
3055-562: The linear scan presents two major drawbacks. First, due to its greedy aspect, it does not take lifetime holes into account, i.e. "ranges where the value of the variable is not needed". Besides, a spilled variable will stay spilled for its entire lifetime. Many other research works followed up on the Poletto's linear scan algorithm. Traub et al., for instance, proposed an algorithm called second-chance binpacking aiming at generating code of better quality. In this approach, spilled variables get
3120-474: The list of statements, s ( i ) {\displaystyle s(i)} , and what can now be observed as the computation at statement, j : This example is based on a Java algorithm for finding the gcd . (It is not important to understand what this function does.) To find out all def-use-chains for variable d, do the following steps: Repeat these steps in the following style: combine each write access with each read access (but NOT
3185-407: The middle-right block. These move operations might not end up in the final code based on the compiler's register allocation procedure. However, this approach may not work when simultaneous operations are speculatively producing inputs to a Φ function, as can happen on wide-issue machines. Typically, a wide-issue machine has a selection instruction used in such situations by the compiler to implement
3250-433: The minimal number of Φ functions required to ensure that each name is assigned a value exactly once and that each reference (use) of a name in the original program can still refer to a unique name. (The latter requirement is needed to ensure that the compiler can write down a name for each operand in each operation.) However, some of these Φ functions could be dead . For this reason, minimal SSA does not necessarily produce
3315-421: The most common problems are identified as follows: Register allocation can happen over a basic block of code: it is said to be "local", and was first mentioned by Horwitz et al. As basic blocks do not contain branches, the allocation process is thought to be fast, because the management of control-flow graph merge points in register allocation reveals itself a time-consuming operation. However, this approach
SECTION 50
#17327659475173380-407: The name on the left hand side of "x ← {\displaystyle \leftarrow } x - 3" and changing the following uses of x to that new name would leave the program unaltered. This can be exploited in SSA by creating two new variables: x 1 and x 2 , each of which is assigned only once. Likewise, giving distinguishing subscripts to all the other variables yields: It
3445-400: The opportunity to be stored later in a register by using a different heuristic from the one used in the standard linear scan algorithm. Instead of using live intervals, the algorithm relies on live ranges, meaning that if a range needs to be spilled, it is not necessary to spill all the other ranges corresponding to this variable. Linear scan allocation was also adapted to take advantage from
3510-405: The other way round). The result should be: You have to take care, if the variable is changed by the time. For example: From line 7 down to line 13 in the source code, d is not redefined / changed. At line 14, d could be redefined. This is why you have to recombine this write access on d with all possible read accesses which could be reached. In this case, only the code beyond line 10
3575-455: The other. Register allocation has typically to deal with a trade-off between code quality, i.e. code that executes quickly, and analysis overhead, i.e. the time spent determining analyzing the source code to generate code with optimized register allocation. From this perspective, execution time of the generated code and time spent in liveness analysis are relevant metrics to compare the different techniques. Once relevant metrics have been chosen,
3640-507: The point marked "2", the use-def chain for x indicates that its current value must have come from line C. Since the value of the x in block 2 does not depend on any definitions in block 1 or earlier, x might as well be a different variable there; practically speaking, it is a different variable — call it x2 . The process of splitting x into two separate variables is called live range splitting . See also static single assignment form . The list of statements determines
3705-453: The previously identified optimal spill set. Register allocation is performed afterwards during the online stage, based on the data collected in the offline phase. In 2007, Bouchez et al. suggested as well to split the register allocation in different stages, having one stage dedicated to spilling, and one dedicated to coloring and coalescing. Several metrics have been used to assess the performance of one register allocation technique against
3770-463: The programmer may use any number of variables . The computer can quickly read and write registers in the CPU , so the computer program runs faster when more variables can be in the CPU's registers. Also, sometimes code accessing registers is more compact, so the code is smaller, and can be fetched faster if it uses registers rather than memory. However, the number of registers is limited. Therefore, when
3835-401: The results of a variety of compiler optimizations , by simplifying the properties of variables. For example, consider this piece of code: Humans can see that the first assignment is not necessary, and that the value of y being used in the third line comes from the second assignment of y . A program would have to perform reaching definition analysis to determine this. But if the program
3900-409: The same color in the graph-coloring to live range that are copy related. Linear scan is another global register allocation approach. It was first proposed by Poletto et al. in 1999. In this approach, the code is not turned into a graph. Instead, all the variables are linearly scanned to determine their live range, represented as an interval. Once the live ranges of all variables have been figured out,
3965-465: The same register, once the copy operation becomes unnecessary. Doing coalescing might have both positive and negative impacts on the colorability of the interference graph. For example, one negative impact that coalescing could have on graph inference colorability is when two nodes are coalesced, as the result node will have a union of the edges of those being coalesced. A positive impact of coalescing on inference graph colorability is, for example, when
SECTION 60
#17327659475174030-489: The use-define or define-use chains is a step in liveness analysis , so that logical representations of all the variables can be identified and tracked through the code. Consider the following snippet of code: Notice that x is assigned a value at three points (marked A, B, and C). However, at the point marked "1", the use-def chain for x should indicate that its current value must have come from line B (and its value at line B must have come from line A). Contrariwise, at
4095-532: The variables. If there are not enough registers to hold all the variables, some variables may be moved to and from RAM . This process is called "spilling" the registers. Over the lifetime of a program, a variable can be both spilled and stored in registers: this variable is then considered as "split". Accessing RAM is significantly slower than accessing registers and so a compiled program runs slower. Therefore, an optimizing compiler aims to assign as many variables to registers as possible. A high " Register pressure "
4160-480: The Φ function. In a control-flow graph, a node A is said to strictly dominate a different node B if it is impossible to reach B without passing through A first. In other words, if node B is reached, then it can be assumed that A has run. A is said to dominate B (or B to be dominated by A) if either A strictly dominates B or A = B. A node which transfers control to a node A is called an immediate predecessor of A. The dominance frontier of node A
4225-411: Was found by Briggs et al.: it is called conservative coalescing. This improvement adds a criterion to decide when two live ranges can be merged. Mainly, in addition to the non-interfering requirements, two variables can only be coalesced if their merging will not cause further spilling. Briggs et al. introduces a second improvement to Chaitin's works which is biased coloring. Biased coloring tries to assign
#516483