11
Oct

# Compilers Lecture 27: Local Register Allocation (1)

>>Not real registers. Now, it’s time to learn how
compilers can map virtual registers into physical
registers, okay? So now, we will be looking into an actual algorithm
for doing this. Remember, in the
earlier lectures, we studied register
allocation conceptually. So we know the — you know, the
concept of register allocation, and we did register allocation
of some — for some sample, you know, test cases
— small test cases. And we did that by
inspection, by just, you know, looking at that — you know,
these few lines of code, and inspecting them, and
figuring out the best way of mapping physical
— virtual registers into physical registers. So if we — you know, we can — we can do the same thing on
this, you know, small example. You know, we can easily come up with a register
allocation scheme for this, and that should be easy. But our purpose now is
not to do, you know, one example by inspection. It’s — our objective is to
study an actual algorithm for doing this systematically — an algorithm that will work for an arbitrarily-long
list of instructions. Now, the register allocation
algorithm that we are about to study today is a local
register allocation algorithm. Local here means allocating
registers within a basic block. So it means that this
register allocator will look at one basic block at a time,
and will allocate the registers within that basic block. So it will not be
trying to keep, you know, the value of the register
alive across basic blocks. If it tries to keep the — a register alive
across basic blocks, that would be global
register allocation, which we will be studying
in the next lectures. But for now, we focus on the
local version of the problem, which is an easier problem. So local register
allocation is easier. It just — we look at one
basic block at a time, and we try to map
the virtual registers into physical registers. Now, the algorithm that we
will be studying today is quite intuitive, in fact. So the idea on a high level is
scan this list of instructions, and try to map a virtual
register to a physical register. So you will need a list of
free physical registers. You need a free list, list
of free physical registers. Of course, you need to know
how many physical registers you have. So in this example,
we will assume that we have two
physical registers. In this example, we have two
physical registers, P1 and P2. So the idea is simple. If there is a physical
register available on the free list, just use it. If there is no physical register
available on the free list, then you have to find one
of the physical registers that are being used, and to
victimize a virtual register that is currently allocated
to a physical register. So if there is no free
physical register, you have to find a
physical register. So one way of doing
this is to look at — to find a physical register
that is currently used by a virtual register,
and preempting that virtual register. So taking the — that
physical register from the virtual register
in order to use it for the current virtual
register, and in order to do that, you know,
we’ll have to — to store the value of that
physical register into memory. So we’ll have to spill it. So spilling means
storing the value of a physical register
into memory. Why? Because this register
can no longer be stored in a physical register. Now, this will become obvious
when we go through this example. So for example, here,
you know, it — mapping the virtual registers, our free list will
initially have both P1 and P2. So we will easily say, “Okay, load X into P1, then
load Y into P2.” Of course, here, you know, I’m
not writing the full syntax of the load instruction,
you know — but we did it in
previous lectures. So just for simplicity,
I’m only saying load X, but we know that there
is a whole syntax, like load address immediate, and R activation record
porter, and all of that. So now, X gets P1 and Y gets
P2, or, in fact, you know, what we are mapping here is that we are mapping
virtual register one into physical register one, and
mapping virtual register two into physical register
number two. Now, at this point, I need
a physical register for V3. Now I look at the free list, and
since I only have two registers, my free list will
not have anything. P1 and P2 have been taken. So, in order to do this, I will
have to victimize one of these. So these cannot keep
their register, because now I have
a need for this. So I will go and pick one of
them, P1 or P2, to get spilled into memory, because now I need
it for virtual register three. Now, if you were doing this, which one would you
victimize, P1 or P2?>>P2.>>Why P2?>>Because P1 is used next.>>Because — I’m
sorry, say that again?>>Using P1 for multiplication
–>>Yeah.>>– in line four.>>Okay. Yeah, exactly. So your guess is right. So that’s why this
algorithm is very intuitive. So generally speaking, if you
need to victimize a register, victimize the register
that you will not use for the longest time,
or the register that — whose next use is the
farthest in the future. So you look at the
next use of P1, and the next use of P1 — of P2. So the next use of P1,
which is V1, is this. This is the next use, and
the next use of P2 is this. So I will need P1 sooner. So the ones that you
need sooner, keep it. The ones that you need later,
spill it, or victimize it. You know, spill it — we
will see how we will spill it to memory. So we will victimize —
of course, in general, we will have more than,
you know, two registers. So we — in general, we will
have N physical registers, and out of these N, we
will victimize the one that we will use the
latest in the future, the one whose next use is the
farthest in the future, okay? So now, in order
to implement this, we will have to track things. Now we are realizing that,
in order to do this — now, we’re developing
an algorithm. We’re not just trying
to solve this, you know, small example by inspection. We need some data
structures to track this. Yes?>>Will our — do — can compilers look through
multiple instructions, or do they have a look-ahead,
sort of like the parsing?>>Yeah, okay. Okay, so this is
a good question. Now, once the compiler generates
a stream of instruction — a stream of instructions
like this, yeah, it has all the flexibility
to look ahead, to analyze the whole
stream of instructions. So when a compiler is
analyzing this block of code — so now we’re assuming that
we are allocating registers for only one basic block. So the compiler’s now processing
this basic block, which, in this example, has
seven instructions in it, and a compiler can scan this
list of seven instructions. It can scan it, and it can
compute all the information that it needs. So it can look at the next
use, and this is something that a compiler actually does. Yeah. So it’s a good question,
because it’s always a good idea to question whether
something is doable or not, whether a real compiler can
actually do this or not. And in this case, yes,
it can do it easily. So it will track for all
the virtual registers. So we will need a — you
know, a data structure for the virtual registers,
and for each virtual register, we will need to track
— we have — — a virtual register, and the
corresponding physical register, and we can say the uses. So for virtual register V1 — so this is what tracks the
current physical register. So V2, V3, V4, V5, and V6, V7 — so the uses for V1 are
instruction four — so these are instruction
numbers. So these numbers are
instruction numbers. Instruction four and instruction
six — four and six, okay? So these uses are — these
numbers are instruction numbers. So this says “virtual
register number one is used in instruction four and in
instruction number six.” Virtual register number
two is used in five. Virtual register number
three is used in four. Virtual register number
four is used in five. Now, okay. So let’s distinguish
between the use and the definition
of a register. So when the value
of a register is — when we are writing a
certain value into a register, this is what we call a “define.” And when we are using
the value of a register, this is what we call a “use.” So in this case, you know,
for example, the multiply — multiply — multiply V1,
V3, and write into V4 — so this is an instruction
that it defines V4, and it uses V1 and V3. Okay, so it defines V4. The define is the
result, you know, where it writes its result. So it defines it. It writes a value into
it, and it uses V1 and V3. Okay? So V4 is used in five. V5 is used in six. V6 is used in seven, and — oh, we don’t have a V7,
so why did I put V7? We only have V6. Okay. So now, we are assuming that the compiler did a first
scan, one scan through this, and it analyzed the code, and it found the uses
for every register. Now, you know, thinking
algorithmically, this is not going
to be costly, right, because this is only one scan. So this is obviously
an O of N scan. So if you — you know, if
you scan the code twice — if the compiler scans
the code twice, this is still O of N, right? It’s — the compiler scans the
code a constant number of times, like two or three times. This is still the same
asymptotic complexity. This is still O of N. We’re not
making — we’re not making it — we are not making it
asymptotically more complex. Okay. And also, now,
we need to track for each physical
register for P1. Let’s track the virtual
registers in it that currently have it — the virtual register
that currently has it, and the next use — so
virtual register and next use. So let’s say V register and next
use, V register and next use, for — for P1 and for P2. So when we start off, for P1 — P1 will be assigned to
virtual register V1, and the next use is going to be the next use
of P1, which is four. And P2 is assigned to
virtual register V2, and the next use is the next
use of V2, which is five. Now, when we get to this point,
we are looking for a register for G. We will look at the
two physical registers, and we will victimize
the one that will be — whose next use is larger,
or the maximum next use, which is in this case, P2. Okay. So now, in order to do
this, we’ll have to spill this. Spilling this means
adding a store instruction. So we are storing —
well, let’s start it in — in a different color. Store P2 into the address of Y,
or let’s — let’s use — okay. So we are storing this
into the address of Y. Okay, so this is address
of X, address of Y, and this is address
of X, address of Y. And this is address of Y. Okay? Now, we can use virtual
register — physical register
number two for V3. So we can do load
address of Z into P2. Okay. Now we have a
different kind of instruction, which is multiply V1 and V3. So in this algorithm,
first it will look at the source operands — the
source operands of this — the source operands
of this instruction. If these source operands
are in physical registers, then it will not
have to do anything. They’re already in physical
registers, and in fact, you know, at this point,
you know, V1 is in P1. V1 is in P1. V2 is in memory. It’s not in a register, and V3
is — oh, V1 is in P1, sorry. I said P1 and wrote P2. Okay. So V1 is in
P1, and V3 is in P2. So now, for this instruction,
you know, we check the values of the source operands to make
sure that they are in memory. If they’re in registers, then
we don’t have to do anything. We don’t have to add
any instructions. If not, we will have to
load them from memory. So, for example, if we had
an instruction that uses — which we will get here — we have an instruction that
uses V2, and V2’s in memory. We’ll have to load it. If it’s in a register, we
don’t have to do anything. If it’s in memory, we’ll have
to add a load instruction, okay? So now, we don’t
have to do anything, so we will just use
these physical registers for number four, multiply. V1 is P1, and V3 is P2. And now, we will need to
find a register for V4. Now, which register should
we — should we get for V4.>>Two.>>P1.>>P1. Doesn’t –>>P1 or P2?>>P2.>>Well, can we use P1?>>Yes.>>Yes. We don’t
need — oh, we –>>No, we still need it, right? So this is something
that we’ll still need, but V3 will not be needed. So how do we know this? Now, this is V3, and this is V2. Now, we know the
next use of each one, so we are now at
instruction four. One, two, three —
sorry, this is not three. This is added instruction,
and this is four. Now we are at instruction four. We are at instruction four, and we know that the
next use of V3 is four. And in the use list of V3,
there is nothing after four. So we know that four
is the last use, or in fact, it’s the only use. So we know that V3
will not be used later. What does this mean?>>Put it in P2.>>So if you were writing
an algorithm for doing this, what would you do,
based on this fact?>>P2.>>Yeah, you will
free P2, exactly, because P2 will not be needed. So you will free P2, but
you will not free P1. So we free P2. So now, we modify the
information for P2 such that now P2 is
temporarily free. Now P2 is temporarily free. So P2 gets freed, you
know, after this — after we process these operands, but before we allocate
the register for V4. Now, when we look for
a register for V4, we will find P2 on
the free list. P2 will be on the free list, because we have freed
it, freed P2. So now, our free list —
the free list at this point, you know, which was originally
— originally, it had P1 and P2. Then it had P2 alone. Then it had none. Now it has P2. So we can use P2. So basically, for every operand,
we first look at the free list. If we find something in
the free list, we use it. If we don’t, we have — we
victimize some other register. Okay. Now, again,
we do the same thing for instruction number five. For instruction number five,
we have V4 — now, is V4 — we forgot to update this. So now V4 is, in
fact, in P2, right? And V3 is not. Okay, so now, when we
look at V4, V4 is assigned to a physical register,
so V4 is in P2. So this is easy. Multiply, so we do the add. This is in a physical
register, so V4 is in P2. Now, V2 — where is V2? V2 — is it in a register? No, it’s in memory. So it’s in memory, so
we’ll have to load it. So we’ll have to add
a load instruction. So we’ll have to load V2, and
now, V2 is a virtual register. And for this virtual
register, you know, we have the information
about it, including the — you know, the memory
location that corresponds to this virtual register, where
for each virtual register, we should associate a
memory location, okay? And this memory location
is where we — you know, we spill it
if we need to spill it. So now, this is — you know, we load it from whatever the
address of Y is, and we load it into register — so
which register now?>>Don’t we need to store –>>Yeah, so it’s still —
you know, on the free list — we don’t have anything
on the free list now. So we don’t have
anything on the free list, because P1 is assigned to V1,
and P2 is assigned now to V4. So both P1 and P2 —
P1 and P2 are taken. So before we do the load, in
fact, we’ll have to do what? Yeah, we’ll have
to do the store. Store of what?>>P1.>>P1, yeah. So we’ll store P1 — store
P1 into address of X, and then we load address
of Y into — into P1. And then, we multiply — we add
— we add — sorry, we add V4, which is P2, with P1, and
then we put the result — now, yet to figure out
where to put the result. Okay. So, you know,
basically for every variable, for every operand, we’ll
have to ask the question — is it already in a register? If it’s already in a
register, there is no problem. We don’t have to do anything. If it’s not already in a
register, then we will have to find a register for it. In order to find a register
for it, what do we first do?>>To store one of them.>>No, before you store.>>We have to look –>>Before you store — well, first you look in
the free list, right? So before you try to
victimize a register, first you look in the free list. If you don’t find
anything in the free list, then you’ll find a
register to victimize. So basically, there are — okay, first, if the operand
— for every operand — — for every operand — let’s call it X. If X is in
a register, nothing to do. Do nothing. Do nothing. Else, we’ll have to
find a register for it. So first, search
in the free list. Search the free list. If there is nothing
in the free list — if the free list is empty — is empty, then spill
the virtual register or the register with what? With the maximum next use, the
register with max next use. So this is basically a
summary of the algorithm. We do this for every operand. If it’s already in a register,
there is nothing to do. If it’s not already
in a register, we first search in
the free list. If it’s — if there is something
in the free list, we grab it. If there is nothing in the free
list, we will find a register to victimize, and we
victimize the register with the maximum next use, the
one that we will not be using for the longest period of time. Okay? So here, in fact, you
know, when we look at P1 and P2, because now P2 is in V4,
and the next use of — of it is going to be — the next
use of it is going to be five — for — yeah, for V4, the
next use is going to be five. So for instruction
number five here, V1 — the next use of V1
is going to be six. So V1 got used in
four, but the next use of V1 will be in this case six. So as we go through
the list, we will have to update the next use. So at this point here, when
we are in instruction five, or we are at this point,
the next use of V1 is going to be six — or P1
is going to be six. For P2, the next use
is going to be five, and that’s why we victimize
— we victimize P1, not P2. Now, we will have to
find a register for — now, do we free anything
after this? So we have P2 and P1
for instruction five. So this is V4 and V2. Do we free any of them?>>Free both of them.>>Yeah. In fact, we free both
of them here, because none of them has a next use
— you know, V4 and V2 — V4 and V2 are — the
last use is five. And this is instruction five,
so this means that, you know, after we do this, but before
we allocate a register for V5, the free list will have both. So another key idea here is
that we update the free list after we find — after we
check on the source operands, but before we check on
the destination operand. Remember this. So we update — so
let me write it. We must update the free list after processing
the source operands, but before processing
the destination operand. Why? Because this may free up
some registers that we can use for the destination operand. And in this case, you
know, we can use — for V5, we can use
either P1 and P — or P2, and we can
— let’s use P1. So now, we — if we update
these, now V5 is in P1. P1 is no longer there. This is no longer there,
and P1 now has V5 in it. And its next use is six, and
P2 currently is not allocated. It’s on the free list. Okay? Now, number
six — now we need — again, we check on the operands. For V5 — so what
happens with V5?>>V5 is already there.>>It’s already in a
register, so we do nothing. If it’s already in a
register, we do nothing. V1 is not already in a register,
so V1 is currently in memory. So if it’s in memory –>>Load it.>>– yeah, so we
will have to load it. Now, in order to load
it, we’ll have to load it into a register, right? Now, to load it into a register, we’ll have to find
a register for it. Is there a register?>>Yes.>>Yes, there is a register on
the free list, and that’s P2. So we load it into P2. So now, we load — we load V1,
which is X, into — into P2. And now, for instruction
number six, we do add V5, which is in P1, and
V1, which is in P2. And we put the result in — we’ll have to find a
register now for this. Now, in order to find a
register for V6, again, we’ll have to update
after processing these. V5 and V1 will not
be used, so now, the free list will
be again P1, P2. And the free list will be P1,
P2, so we can pick any register for this, and we can put it
— we can put V6 into P1. So now, V6 is in P1. P5 is not in a — is
not in a register. Now P1 has V6, and the next
use of it is going to be seven. And P2 is free. Okay? So now, this
is instruction six. Now, finally, we need
to store V6 into — we need to store V6
into our — into memory. So we need the register for
V6, and P2 is available. So we store — we store
V6, which is now P1 — we store it into the
address of Z. Okay? So this is — now
this is the algorithm. Now here, obviously, it did not
give the most efficient code, right? So, you know, again, we need
to understand the algorithm and the limitation
of the algorithm. Because allocating
registers to these — this small piece
of code is easy. Anyone can do it by inspection. You can do this by inspection. You know, for example, if
you look at this manually — okay, I need this, V1 and V3,
so I will first load, you know, one and three, and I will not
load Y. There is no reason to load Y, so you can do
this by inspection easily. And you first load these. Then, in order to add V4
and V2, V4 is in a register, and now you can — you know, you can put the load
here, right, and load V2. So you can easily
do this by hand, and you will produce
more efficient code. But what we are trying to understand here is
the systematic algorithm. Now, in order to do this
algorithm, you know, systematically — we can do
it systematically like this, but then we can add some —
another path to clean up any, you know, redundant code,
or any dead code, right? So what’s the code that we can
— you know, we can probably — if you — if you were to apply
a clean-up path to this code after the allocation — so
what would you eliminate? So there are instructions
that we can eliminate.>>The load and store for number
two looked pretty redundant, because you’re loading
a value into a register, and then removing it
and storing it again.>>Yeah, exactly. So here, you know, load
followed by a store — obviously, load and store for the same register
doesn’t do anything. You are loading it from memory, and then you are storing
without modifying it. So in order to implement
this, you can track — you know, you can scan the code,
and you can track the value, each value, whether
it’s clean or dirty. So when you load it — initially, when you
load it, it’s clean. You haven’t modified it,
and now, when you store it, you store when it is still
clean, no modification. So it’s just a bit, you know,
only one bit, or one Boolean that indicates whether
a given — whether a given virtual
register is dirty or clean, or has it been modified
since it got loaded or not. And here, this Boolean
will tell us that this has not been
modified since it got loaded, so we can eliminate this. Okay? In fact, since
we will be, you know, loading Y again here
— so this load of Y — loading Y will not
get used before we — before we load it again, right? So this load — with
this load here — this load here makes
this load dead code. So this is an — this
is a useless store. When you analyze this after
the fact, after you add all of these — we’re trying to
analyze the code after the fact. So after the fact, this
store is a useless store, because it’s storing something
that we have just loaded, and it hasn’t been modified. So useless store
— let’s write — — and this is dead code,
because there is no use of this P2 until we — you know,
we load it — we load Y again. So this will not —
will not get used. So we can, in fact,
eliminate both of these. Also — also, what —
what can we eliminate? Okay. So assuming that now we
cannot reorder the instructions — so remember, you know,
you may think that — of course, if we
reorder them, you know, we can minimize the
number of loads and stores that are needed. But in a real compiler, a real compiler doesn’t
do everything at once. You know, it doesn’t do
a register allocation and instruction scheduling
at the same time. You know, most production
compilers that I have seen do
— do them separately. So there is an instruction
scheduling stage, and there is a register
allocation stage. And in the register
allocation stage, it — a register allocation algorithm
normally doesn’t consider reordering instructions. If it considers reordering
instructions, it would become extremely
complex. Yes?>>Which one would go first between the register
allocation and the reordering? Would reordering come first? Because if you did
allocation, then reordering, you’d have to do
allocation again.>>Okay, so this
is a good question. Normally, it’s done —
scheduling is done before and after register allocation. So there are two
passes of scheduling, one before and one after. And the one that makes a
higher impact is the one that is done before. So the one that is done before
register allocation is the one that usually makes a higher
impact on the code, because, as you have noted, the ordering
will determine how easy register allocation is, and,
you know, how — how many loads and
stores we will have to insert into the code. Well, in fact, it’s —
you know, it’s not easy. So in fact, you know, we are just applying this
algorithm systematically, and we are pointing
out that, you know, at least these can
be eliminated here. Now, if you think about P1,
you know, we are loading P1. We are using it here,
but P1 is used here. So we can’t — if we
keep P1 in a register, then we will not
be able to use P1. And in fact, you know, since
we have here an instruction that has two operands — if our
instructions have two operands, then we will need at least
two registers, you know, two physical registers. So, for example, if we
had less than two here, we can’t allocate register. We need at least
two, because each — because our instructions
have two operands. So if we had an instruction
that uses three operands, then we will need at least
three physical registers. It will not be doable with less
than three physical registers. So, yeah — so we are using
P1, and we are storing it here because we need the register. And we are loading Y again. You know, we didn’t have to load
— we could eliminate this Y, and we load Y as soon —
you know, before it’s needed for instruction number five. And P1 or X — variable X, we
will need to load it again. So basically, the variable
X — we loaded it here. We used it here. Then we stored it. Then we loaded it again
here in order to use it in instruction number six. So these loads and stores are
what we call the spill code. Spill code is, you know, the
loads and the stores that we add because we don’t have
enough physical registers to keep everything in
a physical register. So here, if we had three
physical registers, we would have needed
no loads or stores. If we had three physical
registers in this case, we would have needed
no loads or stores. Okay? Any questions on
— on this algorithm? So we have to track the — we have to track all
the physical registers, and all the virtual registers, and we have to track
the free list. Okay? Any questions?