- Register allocation
compiler optimization, register allocation is the process of multiplexinga large number of target program variables onto a small number of CPU registers. The goal is to keep as many operands as possible in registers to maximise the execution speed of software programs. Register allocation can happen over a basic block("local register allocation"), over a whole function/procedure ("global register allocation"), or in-between functions as a calling convention("interprocedural register allocation").
computer programs need to process large numbers of different data items. However, most CPUs can only perform operations on a small fixed number of "slots" called registers. Even on machines that support memory operands, register access is considerably faster than memory access. Variables not allocated to registers must be loaded in and out of RAM whenever they are used.
"Register spilling" occurs where there are more live variables than the machine has registers. When a
compileris generating machine codeand there are more live variables than the machine has registers, it has to transfer or "spill" some variablesfrom registers to memory. This incurs a certain cost, as access from memory is typically slower than access from a register.
compilers, "register pressure" occurs when there are more variables to allocate than there are registers available. This typically results in register spilling.
Register allocation is an
NP-completeproblem. [Gregory J. Chaitin, Mark A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. "Register allocation via coloring." Computer Languages, 6:47-57, 1981.] The number of variables in a typical program is much larger than the number of available registers in a processor, so the contents of some variables have to be spilled (saved) into memory locations. The cost of such spilling is minimised by spilling the least frequently used variables first, but it is not easy to know which variables will be used the least. In addition to this the hardware and operating systemmay impose restrictions on the usage of some registers.
Global register allocation
Like most other
compiler optimizations, register allocation is based on the result of some compiler analysis, typically one of live variable analysis, a form of data flow analysis, or the constructionof static single assignment form, which encodes similar information into the name space of the code.
"Global register allocators", which consider an entire procedure, have been built using several paradigms, including bin packing (e.g., the DEC VAX compilers), priority-based
graph coloring(e.g.,Chow-style allocators), and bottom-up graph coloring (e.g., Chaitin-style allocators). Of these approaches, the Chaitin-style allocators appear to be most common. Chaitin's algorithmcan be divided into two phases:
# Machine instructions are generated as if there are an infinite number of symbolic registers. So all variables suitable to being in registers will be assigned to numbered logical registers. The phase is sometimes called "register variable recognition".
# symbolic registers are replaced by physical registers in a target machine, with the minimum cost of spills.
In phase two, an "interference graph" is constructed where nodes are symbolic registers (created in the previous phase) and an arc connects two nodes if they are alive at the same time. More precisely, if one variable is alive at the time the other is defined then they are said to interfere. If the graph can be colored with "R" colors then the variables can be stored in "R" registers. This insight was pointed out by
John Cocke, "father of the RISC architecture". The problem is that coloring a graph is an NP-hard problem.
The key insight to Chaitin’s algorithm is called the "degree < R" rule which is as follows. Given a graph "G" which contains a node "N" with degree less than "R", "G" is "R"-colorable iff the graph "G’", where "G’" is "G" with node "N" removed, is "R"-colorable. The proof is obvious in one direction: if a graph "G" can be colored with "R" colors then the graph "G’" can be created without changing the coloring. In the other direction, suppose we have an "R"-coloring of "G’". Since "N" has a degree of less than "R" there must be at least one color that is not in use for a node adjacent to "N". We can color "N" with this color.
While G cannot be R-colored While graph G has a node N with degree less than R Remove N and its associated edges from G and push N on a stack S End While If the entire graph has been removed then the graph is R-colorable While stack S contains a node N Add N to graph G and assign it a color from the R colors End While Else graph G cannot be colored with R colors Simplify the graph G by choosing an object to spill and remove its node N from G (spill nodes are chosen based on object’s number of definitions and references) End While
This algorithm is O(n^2). This algorithm can be improved through subsumption which is the act of coalescing nodes which are the source and target of copy operations into a single node before running the algorithm. This reduces the number of nodes to color but can increase the degree of any coalesced node. This can only be done when the nodes do not interfere with each other, however, and aggressive coalescing can lead to uncolorable graphs. (Preston Briggs’ thesis work introduces safer methods to determine which nodes to coalesce and spill. Based on his improvements this algorithm is often called the Chaitin-Briggs algorithm.) The subsumption step is slow and is not done in fast register allocators.
Graph coloring allocators produce efficient code, but their allocation time is high. In cases of static compilation, allocation time is not a significant concern. In cases of dynamic compilation, such as just-in-time (JIT) compilers, fast register allocation is important. An efficient technique proposed by Poletto and Sarkar is [http://www.research.ibm.com/jalapeno/papers/toplas99.pdf linear scan allocation] . This technique requires only a single pass over the list of variable live ranges. Ranges with short lifetimes are assigned to registers, whereas those with long lifetimes tend to be spilled, or reside in memory. The results are on average only 12% less efficient than graph coloring allocators.
The linear scan algorithm follows:
# Perform dataflow analysis to gather liveness information. Keep track of all variables’ live intervals, the interval when a variable is live, in a list sorted in order of increasing start point (note that this ordering is free if the list is built when computing liveness.) We consider variables and their intervals to be interchangeable in this algorithm.
# Iterate through liveness start points and allocate a register from the available register pool to each live variable.
# At each step maintain a list of active intervals sorted by the end point of the live intervals. (Note that insertion sort into a balanced binary tree can be used to maintain this list at linear cost.) Remove any expired intervals from the active list and free the expired interval’s register to the available register pool.
# In the case where the active list is size R we cannot allocate a register. In this case add the current interval to the active pool without allocating a register. Spill the interval from the active list with the furthest end point. Assign the register from the spilled interval to the current interval or, if the current interval is the one spilled, do not change register assignments.
Cooper and Dasgupta recently developed a "lossy" Chaitin-Briggs graph coloring algorithm suitable for use in a JIT [Cooper, Dasgupta, "Tailoring Graph-coloring Register Allocation For Runtime Compilation", http://llvm.org/pubs/2006-04-04-CGO-GraphColoring.html] . The "lossy" moniker refers to the imprecision the algorithm introduces into the interference graph. This optimization reduces the costly graph building step of Chaitin-Briggs making it suitable for runtime compilation. Experiments indicate that this lossy register allocator outperforms linear scan on the majority of tests used.
"Optimal" register allocation algorithms based on Integer Programming have been developed by Goodwin and Wilken for regular architectures. These algorithms have been extended to irregular architectures by Kong and Wilken.
While the worst case execution time is exponential, the experimental results show that the actual time is typically of order of the number of constraints [Kong, Wilken, "Precise Register Allocation for Irregular Architectures", http://www.ece.ucdavis.edu/cerl/cerl_arch/irreg.pdf] .
Wikimedia Foundation. 2010.