Compilers Lecture 24: Code Generation for Expressions

>>Okay, so — yeah, so
today we’ll be talking about simple code generation. Okay, so, so far, we have
covered parsing, so we know how to do parsing, and we know how to construct the abstract syntax
tree, and we talked about how to go from an abstract
syntax tree into actual code. So if we have an
example like this, A — we will focus on
expressions today, A times 3 plus B. The abstract
syntax tree for this is going to be like plus is equal to the,
you know, multiplication of A and 3, and then you
add this to B, so this is the abstract
syntax tree. Now, in terms of
code generation, we talked about this. We talked about how we can
generate code for this. Today we’re going to talk
about it in greater detail. You know, we will — we will
think about all the issues that are involved in generating
code for an expression like this, and in today’s
lecture, we will concentrate on generating code
for an expression. So in the next lectures, we
will look into code generation for conditionals like
if/else and loops, but today we will just
focus on expressions, how we can generate
code for expressions. So this case, you know,
the code that we want, this is a variable, so we would
like to load the variable, load A, and we want to
load 3, so we will load A into some register, like
R1, load immediate of 3 into some register, R2. Of course, loading A here is
loading a variable, which means, you know, a memory location,
and that consists of a base and an offset in general. So remember the instruction that
we used in the earlier lectures, load AI, load for address and
immediate, where the base is in a register, A or R. ARP,
what does ARP stand for?>>Activation record pointer.>>Yeah, activation
record pointer, exactly. So RA, activation record
pointer, and when the address of A, I’m going to put this
into some register, R1, loading immediate of 3, I’m
putting this in R2, then we want to multiply, what, R1 and R2,
and I’m going to put the result in some register, R3, then
here we want to load AIR, activation record pointer,
address of B into some register, like R4, and here we want to add
— we want to add what and what?>>R2 — R3 and R4.>>R3 and R4, and put
that in some register, R5. So now today’s lecture is about
the implementation details of this, not the implementation
details of the coding level, but algorithmically, because
in the programming assignment, you will have to write code that traverses the
abstract syntax tree and generates these
instructions. So we will look into this. We will write pseudo-code
for this. Basically, our objective here is to write pseudo-code
for doing this. So how do you think
that pseudo-code will — you know, what do you think that
pseudo-code would look like? What . . .>>It is for a while.>>Yeah, so it’s a tree
traversal, and we said what kind of traversal will serve
us well here, depth? Depth — and by the
way, why is it depth? Because in order to
do this multiply, we’ll have to calculate
everything beneath it, everything below the
multiply had to be calculated, so we have to go in depth. We can’t calculate the multiply until we have calculated
everything below it, and same applies to
every operation in this. So it has to be recursive. So what if we start, let’s define a function
expression generate or — yeah, for generating
an expression, and that will take a node,
so it’s going to take — so this is pseudo-code. It’s not actual code. So the parameter for this
is going to be a node. And then it’s going to switch
on the type of the node. So there will be a switch
on the type of the node. So here we are implying that
the node is a data structure. You can think of it as a
class or a struct node, and this node has
some fields in it. One field is the type, and
the type in this case is, for expressions, is going to be
— there are three types here. What are the three main types?>>Operator, variables,
and immediate value.>>Yeah, so it’s either
an arithmetic operation for what we are limiting
ourselves to at this point. It’s either an arithmetic
operation or a leaf, and the leaf is an operant, and this operant could
be either a variable or a number, an immediate
operant. So it’s either an
operation or an identifier, or let’s call it a variable
here in this context because identifier makes more
sense in the context of parsing, but here it’s a variable, so
“variable” is a better word in this context, or a number. Then we need the
kind of operation — what kind of operation? If it’s an operation, then
we need to specify the kind of operation, which is,
you know, add or subtract or divide or multiply. We need to know the operation. Then our other fields, now,
as we write the code, we will, you know, discover what
the fields that we need are and then we will add them as
we discover the need for them. Okay, so node type, so let’s
first do the easy ones. If it’s, you know, a case
variable, if it’s a variable, or in fact, the easiest
is the — let’s start with the
easiest, which is the number. So if it’s the number, then
all what we need to do is to issue a load immediate
instruction or to generate — sorry, generate here
in this context. We’re not issuing, we
are generating code. To generate a load immediate
instruction, so let’s assume that we have a function emit,
we have an emit function that takes an operation, source
1, source 2, and a destination, and it knows how to emit
the right instructions. So if we — if this operation
is load immediate, it will know that source 1 has to be
a number, an integer, and the destination
is a register. Source 2 is not used
because for a load immediate, we only have one source,
which is the immediate value, and we have a destination,
which is the register. For the load address immediate,
the load that loads, you know, based on the register,
a base and an offset. So source 1 is the base,
source 2 is the offset, and the destination
is the register that will receive the
result of the load. Okay, so here, you know, this is
source 1, this is destination, this is source 1, this is
source 2, this is destination. And for the add, you know,
this is our source 1, this is our source 2,
this is our destination. Okay, so we have this — we assume that we have
this clever emit function that will emit the right code. So in this case, we will
call it, if it’s a number, we will emit a load
immediate, load immediate, and the first source, so we
need to give it the number. So where will the
number come from? So, now, you are working
on your assignment, on the programming assignment, and you have constructed
the abstract syntax tree, but you need the number. So in this node, we
should also have a field that stores the value. You know, we are using a genetic
data structure, by the way. There are other ways of
designing this so that, you know, you can — for example, you can have a base
node class, and then you derive from this base node class
different derived classes for the different
types of nodes. You know, an operation
node is different from an identifier node is
different from a number node because each type of node is
different, and if you want to do this — if you want
to design this properly, you will have a base class for
the node that has everything that is common to all nodes,
and then in the derived classes, you will have whatever is
specific to that particular type of node, such as an operation
or an identifier or a number. But here we’re not going
to worry about this, so we’ll put every — all
the fields that we need, we’ll just put them in one
data structure for now. Okay –>>We need a value.>>We need a value, exactly. So when we create — now,
you should start thinking, when you are creating
this abstract syntax tree, then in this node, when
it’s the number node, you need to store
the value somewhere. So now we need this
load immediate. Our source 1 is going to be
node.value, source 2 is going to be null for the immediate,
for the load immediate, and our destination is going
to be a register number that will receive
the destination. Now, the question is: What
should that register number be? Look at this abstract syntax
tree and the node that — the code that we would
like to generate. So we have these
destination register numbers, so where do you think
these will come from? If you were writing the
code to implement this, where will the — you
know, here we put R1, here we put R2, R3, R4. So where do you think
these should come from, these register numbers?>>Also, when you have
infinite registers, you want to count track of
what you already allocated, and so you would
take that counter.>>Yeah, exactly, so exactly. All what you need is a
counter, and remember that at this point we are
doing only virtual registers, so when we are doing
virtual registers, we can have as many
registers as we want, so all what we need
is a counter. So just we need the function that gets the next
register number. So if we have a function
that get next register, this function is going to
return the next register number, so the first time you call
it it’s going to return one, the second time you call it,
it’s going to return two, then it’s going to
return three, and so on. So all what you need
is a counter because these are just
virtual registers. So we will just call this thing, so our T equals get
next register, and it will return a T, and we will pass it
here as a destination. And then we break for this case. So this is what we need to do in
order to generate an immediate. Now, to generate codes for
a variable, so what did — we called it “variable,” okay. For a variable, well,
again, since now we know that we need — we need
this emit, emit operation, and we need a destination,
now the base and the offset for this are going
to come from where? So to generate code for loading
a variable, we need the base and the offset, and where
do you think these will be? Because store them in the —
in this data structure as well, so we can store the
base and the offset. So, again, you know, we can
do T equals get next register, and we can do emit of
load address immediate. Source 1 is going to be
node.base, source 2 is going to be node.offset, and
the destination is going to be the T, and we break. So these are the leaf nodes. Now, what about the
in-tandem nodes which are operations,
case operation? Now, when the node
is in operation, now things are more
— we need to do more than just issuing an instruction
or generating an instruction. So now think about this. To generate code for this,
what do we need to do?>>[indiscernible] previous
registers that were allocated and then put it into
a new register.>>Okay, so we know how to
get the next register number, but we need to — before we
can calculate the multiply, we have to calculate
everything that is below it. So this has to be done and
this can be a huge tree. So in this example, we only
have two nodes, but in general, if you have an op like this,
the left child, you know, this can have a whole
tree below it, and this can have a whole
tree, generally speaking, for [indiscernible] expression. So what do you think the
right thing to do is here, because this can be any
complicated expression, and this can be a
complicated expression as well? So what should we do
here when, at this point, when we are generating
code for this? So what’s the right thing to do? Yes.>>You have to make
sure that each of the child trees
are evaluated first.>>And how do you evaluate them?>>You have to go
down [inaudible].>>Okay. So what should
I write in terms of code?>>Expression and put the left
node and then the right node.>>Yeah, exactly. So I should call
expression recursively. So this is recursive. So this is an expression
at this level. Before I can compute this, I
have to compute the expression for the left-hand side and
for the right-hand side and put each one in a
register, well, the register that receives the
result of this, okay? So in — okay, so I
have to call expression. Now, how do I get
to those nodes?>>Node.left, node.right.>>Yeah, exactly, so I have to store the left
and right as well. So we are discovering, you know,
the fields that we need to store in this node data structure,
and this is something that you will be doing
in the assignment. Okay, so expression of node.left
and we put this into — let’s call it T1 and T2. We’ll do expression node.right. Okay, now this means that whenever I compute
an expression, I have to return the, you
know, the destination, the destination register, right? So it’s — in every case, I have to return the destination
register so that the node, the parent node knows this. So expression for
A should return R1, so this will become
a return value from the left child
for this parent. So the parent looks at the
left and the right and each one of them should return,
so this means that this expression must return
the register that has the value, the destination of
this, so in this case, we will just return T.
So now our T has to be — has to have the destination
T equals get next register, so this is the T that will — or maybe we should
call it “result.” That’s going to be
more explicit. You can call it “T,” that’s
fine, but if we call it “result,” it will
be more explicit. Okay, I’m going to
call it “result.” It will be a more
descriptive name. So this is a result, a result, a
result, a result, and a result. And we will return the result
at the end, so return a result. Now, to make sure that
everybody understands this, we’re going to trace it. We’re going to go into
tracing after we are done. Yeah?>>You’re going to have to
initiate another switch state to handle all four
of your operators, or in this case,
the two for the –>>Okay, so why are
we doing this? We are assuming that this emit
function is intelligent enough, is smart enough to generate the
right code for the operation. In fact, it’s — the
only difference is going to be what that operation is. It’s add or multiply
or subtract. Everything else is
going to be the same. So we’re sending this
to the emit function, and the emit function, we give
it an operation, source 1, source 2, destination, and it
knows how to generate the code that we — that we want. Now, so we evaluate the
left, we evaluate the right, and then we get the
register for each one, and then we get the
next register. Oh, I missed the
most important thing. Well, I missed an important
thing here, which is what?>>When you do emit.>>Yeah, emitting,
so we have to emit. In this case, emit —
what’s the operation? The operation is whatever
this operation here, the operation.op. Yeah, so it’s — this
is the op field in this, the actual operation
name, or, you know, you can call it “opting.” So this is the operation. I send it to the emit, and my
source 1 is going to be T1, my source 2 is going to be T2,
and my destination is going to be the result, and
then I’m going to break. And at the end, I
return the result, okay? And here, you know, what are
these T1, T2, and result? In fact, these can be integers. You know, when you work
on this in the assignment, you will realize that these
register numbers can be just integers. Okay, so now let’s test
it on this example. So if we test it on this
example, we call the — we start traversing the
tree from the root, right? So we start from the
root in the beginning, so let’s put the stack. So first we call
expression on root, so expression on
plus, this node. Now, expression on plus
is going to switch. It’s going to go,
okay, this is an op, so it will recursively
call itself on node left, and node left is going to be
the multiply, so it’s going to call expression “multiply.” Now, an expression
multiply, it’s an op again. Now it’s going to call
the left, its own left, the left of op is going to
be this, expression of A. So we are going now
on the stack. What we have on the stack is
the leftmost staff [phonetic] in the tree, expression of A.
Now when we get to expression of A, it will identify
A as a variable and it will generate
this, you know, with register number 1
being — holding the result. This will be the first time
we call get next register. Get next register, and
this will give us R1 and it will generate this
instruction that we want. Now, when this returns,
when load A returns, we will be at this
point in which function? In which –>>The multiply.>>Yeah, the multiply, exactly. So we are tracing the recursion
here because it’s, you know, important to understand that
this whole thing is recursive, so when expression A returns,
it’s going to be at this point, and we are in expression
of multiply. Now, the right-hand side of
multiply is going to be the 3, so it’s going to
call expression of 3, and when we call expression
of 3 to load immediate, the 3 and this — it will
put it in register number 2. Now, and this will complete. Now, we are in expression of
multiply, so at this point, we are in expression of
multiply, and we are here, you know, we finished this. This returned R1, this
returned R2, and now it’s going to get the next register. The get next register
is going to give it R3. So this will generate this
multiply that we want. So it’s going to multiply R1
that came from the left child, that holds the result
of the left child, R2 that holds the result
of the right child, and R3, the new register that
we have just generated, so it will generate the,
you know, multiply R1, R2, and put the result in R3. Okay? Now, all of this
is in the multiply. Now, the multiply will
eventually return after this. Now it will return to
the add, and now we are, at this point, in the add. So now we are here, but
now we have returned to the add, the top
level, right? Now, the left child
returned, and the left child for the add returns what?>>R3.>>R3, yeah. So it will return R3. Now it will call its right
child, and the right child for this is the expression of
B, and expression of B is going to be the load immediate
and it’s going to return R4. Now, at the add level,
at this node’s level, get next register is
going to return what?>>R5.>>It’s going to return R5. It’s just a counter. So then we will emit that
instruction, add R3 and R4 and put the result in R5. Okay, so this is how it works. So let me ask some
questions now to make sure that everyone understands
how this works.>>Seems simple enough.>>Yeah, so it’s simple if you
— if you match the recursion. So if you are comfortable
with recursion, then this is, you know, this should be very
intuitive, and DFS is an example of a recursive algorithm. Now, okay, question: If I —
here, if I move this result, get next register,
if I move this here, would this work or not? Will it work correctly or not?>>No.>>No.>>Okay, I think it
will work correctly.>>Yeah, it will
still work correctly.>>But what will happen? What — what difference will . . .>>Yeah, it won’t be — it
won’t be in the order — the register allocations
won’t be in the order that we showed here
in the example. They’ll be different.>>Okay, so it will be different
register numbers will be used, but it will be correct. So how different will
the register numbers be? So what if — what will happen
to the register numbers? So let’s renumber the registers. If we just do this . . .>>It’ll put — well, the
first one, in the add, it will put it — instead of R5,
it’ll set the destination as R1.>>Exactly, so the
destination is going to be R1. So basically, it’s going
to do the allocation for the destination
in pre-order, not in post-order,
but that’s fine. These are just register numbers. So it’s going to set
the destination to R1 and then it will call the,
you know, the left child. The left child, the destination for the left child
is going to be R2.>>Right.>>Okay, and here –>>It’ll be R3.>>This will be R3. So now the left child for
the add is going to be R3, the right side is going
to be R4, all right, and then we will
return to the add, then this is going to be R5. So this will be R2 and R5
and put the result in R1. So it will be assigning register
numbers in pre-order instead of post-order, but
these are just numbers because these are
virtual registers anyway and it doesn’t matter. So this will still be correct. Now, what if we move
the emit here? So what if we do the emit? Well, not here, but,
yeah, the emit — or let’s move the emit
and the result, okay? So . . . So if we move the emit and the
result here before T1 and T2, before calling expression, so
if we do them in pre-order . . .>>Won’t work.>>Yeah, this won’t work. So basically, getting a register
number is not a big deal. We can do it before or after we
compute the expressions below the current node. That’s fine, it’s just a number. But calculating the actual
expressions below that and, you know, computing them must be
done before we can generate code for the current node
because this is dependent on these, okay? So the actual computation
has to happen in post-order, but the numbering of the
registers doesn’t matter. You can do it pre-order,
post-order, any order. Okay? Any questions on this? Yes?>>In fact, if you had put the
emit before the evaluations of T1 and T2, you’d be
using the previous registers that were used, T1 and T2, from whatever operation was
done prior to that, right?>>Yes, yeah, but it will
be — it will be wrong. Yeah, yeah, the T1 and T2, it will not even have
correct numbers, yeah. So it’s, yeah, so
it will not work.>>I was assuming T1
and T2 were global because it’s a recursive call
and those are local variables, so you would actually
be putting garbage into that function
at that point.>>Yes.>>Because it wouldn’t
be initialized. It would be whatever is in
those locations at that time.>>Yeah, but even if you
find a way of, you know, you can number them
any way you want. You can say, okay, I want
the result of this node to be in this register. So you can find the solution
for the numbering problem, but you will not
find the solution for the actual dependence
problem. Okay? Any other questions?

Tags: , , , , ,

There are no comments yet

Why not be the first

Leave a Reply

Your email address will not be published. Required fields are marked *