## CS6013 - Modern Compilers: Theory and Practise Register Allocation

#### V. Krishna Nandivada

IIT Madras

## Register allocation

Copyright © 2016 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from hosking@cs.purdue.edu.

#### 3 / 1

## **Opening remarks**

What have we done so far?

- Compiler overview.
- Scanning and parsing.
- JavaCC, visitors and JTB
- Semantic Analysis specification, execution, attribute grammars.
- Type checking, Intermediate Representation, Intermediate code generation.
- Control flow analysis, interval analysis, structural analysis
- Data flow analaysis, intra-procedural and inter-procedural constant propagation.
- Points-to analysis

Announcement:

• Assignment 5 is out. Due in three weeks.

Today: Liveness analysis and register allocation.

V.Krishna Nandivada (IIT Madras)

CS6013 - Jan 2016

## **Register allocation**



Register allocation:

- have value in a register when used
- Iimited resources
- can effect the instruction choices
- can move loads and stores
- optimal allocation is difficult
- $\Rightarrow$  NP-complete for  $k \ge 1$  registers

Problem:

- IR contains an unbounded number of temporaries
- machine has bounded number of registers

#### Approach:

- temporaries with disjoint live ranges can map to same register
- if not enough registers then spill some temporaries (i.e., keep them in memory)
- The compiler must perform liveness analysis for each temporary:

It is live if it holds a value that may be needed in future



## Liveness analysis

Gathering liveness information is a form of data flow analysis operating over the CFG:

- We will treat each statement as a different basic block.
- liveness of variables "flows" around the edges of the graph
- assignments define a variable, v:
  - def(v) = set of graph nodes that define v
  - def[n] = set of variables defined by n
- occurrences of *v* in expressions use it:
  - Use(v) = set of nodes that use v
  - Use[n] = set of variables used in n

## Example





#### V.Krishna Nandivada (IIT Madras)

CS6013 - Jan 2016

## Definitions

- v is live on edge e if there is a directed path from SRC(e) to a use of v that does not pass through any def(v)
- v is live-in at node n if live on all of n's in-edges
- v is live-out at n if live on any of n's out-edges
- $v \in USe[n] \Rightarrow v$  live-in at n
- *v* live-in at  $n \Rightarrow v$  live-out at all  $m \in pred[n]$
- *v* live-out at  $n, v \notin def[n] \Rightarrow v$  live-in at n



### Liveness analysis

Define:

$$in[n] =$$
 variables live-in at  $n$   
 $out[n] =$  variables live-out at  $n$ 

Then:

$$egin{array}{rcl} {\it out}[n] &= & igcup_{s\in {\it succ}(n)} {\it in}[s] \ {\it succ}[n] = \phi &\Rightarrow & {\it out}[n] = \phi \end{array}$$

Note:

$$in[n] \supseteq use[n]$$
  
 $in[n] \supseteq out[n] - def[n]$ 

CS6013 - Jan 2016

use[n] and def[n] are constant (independent of control flow) Now,  $v \in in[n]$  iff.  $v \in use[n]$  or  $v \in out[n] - def[n]$ Thus,  $in[n] = use[n] \cup (out[n] - def[n])$ 

V.Krishna Nandivada (IIT Madras)

Notes

- should order computation of inner loop to follow the "flow"
- liveness flows backward along control-flow arcs, from out to in
- nodes can just as easily be basic blocks to reduce CFG size
- could do one variable at a time, from <u>uses</u> back to <u>defs</u>, noting liveness along the way

$$\begin{split} N: & \text{Set of nodes of CFG}; \\ & \textbf{foreach } \underline{n \in N} \text{ do} \\ & in[n] \leftarrow \phi; \\ & out[n] \leftarrow \phi; \\ & \textbf{end} \\ & \textbf{repeat} \\ & \textbf{foreach } \underline{n \in \text{Nodes }} \text{ do} \\ & in'[n] \leftarrow in[n]; \\ & out'[n] \leftarrow out[n]; \\ & in[n] \leftarrow use[n] \cup (out[n] - def[n]); \\ & out[n] \leftarrow \bigcup_{s \in succ[n]} in[s]; \\ & \textbf{end} \\ \end{split}$$

until  $\underline{\forall n, in'[n] = in[n] \lor out'[n] = out[n]}$ ;

V.Krishna Nandivada (IIT Madras)

## Iterative solution for liveness

Complexity: for input program of size N

- $\leq N$  nodes in CFG
  - $\Rightarrow \leq N$  variables
  - $\Rightarrow$  N elements per *in/out*
  - $\Rightarrow$  O(N) time per set-union
- for loop performs constant number of set operations per node  $\Rightarrow O(N^2)$  time for for loop

CS6013 - Jan 2016

- each iteration of repeat loop can only add to each set sets can contain at most every variable
   ⇒ sizes of all in and out sets sum to 2N<sup>2</sup>,
  - bounding the number of iterations of the **repeat** loop
- $\Rightarrow$  worst-case complexity of O(N<sup>4</sup>)
- ordering can cut **repeat** loop down to 2-3 iterations  $\Rightarrow O(N)$  or  $O(N^2)$  in practice

## Least fixed points

Any solution to dataflow equations is a conservative approximation:

• v has some later use downstream from n

 $\Rightarrow v \in out(n)$ 

• but not the converse

Conservatively assuming a variable is live does not break the program; just means more registers may be needed.

Assuming a variable is dead when really live will break things.

Many possible solutions but we want the "smallest": the least fixpoint. The iterative algorithm computes this least fixpoint.



## Graph coloring - a simplistic approach

**Input**: *G* - the interference graph, *K* - number of colors **repeat** 

#### // Simplify

#### repeat

Remove a node n and all its edges from G, such that degree of n is less than K;

Push *n* onto a stack;

**until**  $\underline{G}$  has no node with degree less than K;

// G is either empty or all of its nodes have degree  $\geq$  K

// Spill

#### if G is not empty then

Take one node m out of G, and mark it for spilling; Remove all the edges of m from G;

#### end

#### **until** $\underline{G}$ is empty;

Take one node at a time from the stack and assign a non conflicting color.



- Step 1:
  - Select target machine instructions assuming infinite registers (temps).
  - If a instruction requires a special register replace that temp with that register.
- Step 2:
  - Construct an interference graph.
  - Solve the register allocation problem by coloring the graph.
  - A graph is said to be <u>colored</u> if each each pair of neighboring nodes have different colors.

Parts of slides: sources - Andrew Myers



CS6013 - Jan 2016

14/1

## Example 1, available colors = 2



V.Krishna Nandivada (IIT Madras)



#### We have to spill.

| V.Krishna Nandivada (IIT Madras) | CS6013 - Jan 2016 |  |
|----------------------------------|-------------------|--|

## Example 2 (revisited)



# Graph coloring - Kempe's heuristic

Algorithm dating back to 1879.

**Input**: *G* - the interference graph, *K* - number of colors repeat

#### repeat

Remove a node *n* and all its edges from *G*, such that degree of *n* is less than K;

Push *n* onto a stack;

#### **until** G has no node with degree less than K;

// G is either empty or all of its nodes have degree ≥ к

#### if G is not empty then

Take one node *m* out of *G*.; push *m* onto the stack;

end

17/1

#### **until** G is empty ;

Take one node at a time from the stack and assign a <u>non conflicting</u> color possible, else spill).

```
V.Krishna Nandivada (IIT Madras)
```

CS6013 - Jan 2016

# Example 3



- We need to generate extra instructions to load variables from the stack and store them back.
- The load and store may require registers again:
  - Naive approach: Keep a separate register (wasteful).
  - Rewrite the code by introducing a temporary; rerun the liveness + ra.

(Note: the new temp has much smaller live range).

**Consider:** add t1 t2

- Suppose t2 has to be spilled, say to [sp-4].
- Invent a new temp t35, and rewrite:
  - mov t35 [sp-4] add t1 t35
- t35 has a very short live range and less likely to interfere.

CS6013 - Jan 2016

• Now rerun the algo.

V.Krishna Nandivada (IIT Madras)



CS6013 - Jan 2016

## Register allocation - Linear scan

Register allocation is **expensive**.

- Many algorithms use heuristics for graph coloring.
- Allocation may take time quadratic in the number of live intervals.

#### Not suitable

- Online compilers need to generate code quickly. e.g. JIT compilers.
- Sacrifice efficient register allocation for compilation speed.

Linear scan register allocation - Massimiliano Poletto and Vivek Sarkar, ACM TOPLAS 1999

• Complexity linear in the number of variables (assuming the number of register is not too large).



# **Register allocation - Chaitins**

Simplify

#### 2 Spill

- Select: assign colors to nodes
  - start with empty graph and keep adding nodes:
  - 2 if adding a non-spill node will have a color (basis for removal)
  - if adding spill node and no color available (neighbors already K-colored) then mark as an <u>actual spill</u>; break;
  - continue to select nodes.
- Start over: if select has no actual spills then finished, otherwise
  - rewrite code: fetch spills at use, store at definition
  - 2 recalculate liveness and repeat



## Simplification with aggressive coalescing (by Chaitin)

done

any

build

aggressive

coalesce

simplify

spill

select

- Can delete a <u>move</u> instruction when source *s* and destination *d* do not interfere:
  - <u>coalesce</u> them into a new node whose edges are the union of those of *s* and *d*
- In principle, any pair of non-interfering nodes can be coalesced
  - unfortunately, the union is more constrained and new graph may no longer be *K*-colorable
  - overly aggressive

| V.Krishna Nandivada (IIT Madras) | CS6013 - Jan 2016 | 25 / 1 |
|----------------------------------|-------------------|--------|

## Conservative coalescing

Apply tests for coalescing that preserve colorability. Suppose *a* and *b* are candidates for coalescing into node *ab*. Briggs: coalesce only if *ab* has < K neighbors of significant degree  $\ge K$ 

- simplify first removes all insignificant-degree neighbors
- *ab* will then be adjacent to < *K* neighbors
- simplify can then remove *ab*

<u>George</u>: coalesce only if all significant-degree neighbors of a already interfere with b

- simplify removes all insignificant-degree neighbors of a
- remaining significant-degree neighbors of *a* already interfere with *b*; coalescing does not increase degree of any node



V.Krishna Nandivada (IIT Madras)

Interleave simplification with coalescing to eliminate most moves while guaranteeing not to introduce spills:

CS6013 - Jan 2016

- Build interference graph G and distinguish move-related from non-move-related nodes. A move-related node is one that is either the source or destination of a move instruction.
- Simplify: remove non-move-related nodes of low degree one at a time
- Ocalesce: conservatively coalesce move-related nodes
  - remove associated move instruction
  - if resulting node is non-move-related it can now be simplified
  - repeat simplify and coalesce until only significant-degree or uncoalesced moves







## Precolored nodes

V.Krishna Nandivada (IIT Madras)

<u>Precolored nodes</u> correspond to machine registers (e.g., stack pointer, arguments, return address, return value)

CS6013 - Jan 2016

- <u>select</u> and <u>coalesce</u> can give an ordinary temporary the same color as a precolored register, if they don't interfere
- e.g., argument registers can be reused inside procedures for a temporary
- simplify, freeze and spill cannot be performed on them
- also, precolored nodes interfere with other precolored nodes

So, treat precolored nodes as having infinite degree

This also avoids needing to store large adjacency lists for precolored nodes; coalescing can use the George criterion

## Temporary copies of machine registers

Since precolored nodes don't spill, their live ranges must be kept short:

CS6013 - Jan 2016

use <u>move</u> instructions

V.Krishna Nandivada (IIT Madras)

- entry, and back on exit, spilling between as necessary
- register pressure will spill the fresh temporaries as necessary, otherwise they can be coalesced with their precolored counterpart and the moves deleted



## Criteria for spilling

During register allocation, we identify that one of the live ranges from a given set, has to be spilled. Criteria?

- Random! Adv? Disadv?
- One with maximum degree
- One that has the longest life
- One with the shortest life (take advantage of the cache).
- One with least cost.
  - Cost = Dynamic (load cost + store cost)
  - How to handle loops, conditionals?
  - Cost of load, store



# Example (cont.)

#### Interference graph:



## Example



V.Krishna Nandivada (IIT Madras)

# Example (cont.)

- No opportunity for simplify or freeze (all non-precolored nodes have significant degree  $\geq K$ )
- Any coalesce will produce a new node adjacent to  $\geq K$ significant-degree nodes
- Must spill based on priorities:

| Node |        |              | uses + d  | lefs | degro | ee | priority |
|------|--------|--------------|-----------|------|-------|----|----------|
|      | outsid | e loop       | inside lo | ор   |       |    |          |
| a (  | 2      | $+10\times$  | 0         | )/   | 4     | =  | 0.50     |
| b (  | 1      | $+10\times$  | 1         | )/   | 4     | =  | 2.75     |
| с (  | 2      | $+10 \times$ | 0         | )/   | 6     | =  | 0.33     |
| d (  | 2      | $+10\times$  | 2         | )/   | 4     | =  | 5.50     |
| e (  | 1      | $+10\times$  | 3         | )/   | 3     | =  | 10.30    |

• Node c has lowest priority so spill it

34/1

33 / 1



# Example (cont.)

Only possibility is to <u>coalesce</u> a and e: ae will have < K significant-degree neighbors (after coalescing d will be low-degree, though high-degree before)



# Example (cont.)



# Example (cont.)



### Cannot coalesce rlae with d because the move is constrained: the nodes interfere. Must simplify d:



## Example (cont.)

### enter: c1 := r3 M[c loc] := c1a := r1 b := r2 d := 0 e := a loop: d := d + b e := e - 1 if e > 0 goto loop r1 := d $c2 := M[c_loc]$ r3 := c2 return [ r1, r3 live out ]

# Example (cont.)

- Graph now has only precolored nodes, so pop nodes from stack coloring along the way
  - $d \equiv r3$
  - a, b, e have colors by coalescing
  - c must spill since no color can be found for it
- Introduce new temporaries c1 and c2 for each use/def, add loads before each use and stores after each def



# Example (cont.)



41/1



# Example (cont.)



# Example (cont.)



# Example (cont.)

Pop d from stack: select r3. All other nodes were coalesced or precolored. So, the coloring is:

- a ≡ r1
  b ≡ r2
  c ≡ r3
- $d \equiv r3$
- e ≡ r1

#### Rewrite the program with this assignment:

enter: r3 := r3 M[c\_loc] := r3 r1 := r1 r2 := r2 r3 := 0 r1 := r1 loop: r3 := r3 + r2 r1 := r1 - 1 if r1 > 0 goto loop r1 := r3 r3 := M[c\_loc] r3 := r3 return [ r1, r3 live out ]

# Example (cont.)

• Delete moves with source and destination the same (coalesced):

CS6013 - Jan 2016

```
enter:
    M[c_loc] := r3
    r3 := 0
loop:
    r2 := r3 + r2
    r1 := r1 - 1
    if r1 > 0 goto loop
    r1 := r3
    r3 := M[c_loc]
    return [ r1, r3 live out ]
```

• One uncoalesced move remains



V.Krishna Nandivada (IIT Madras)

CS6013 - Jan 2016

49 / 1

V.Krishna Nandivada (IIT Madras)

50 / 1