# Compilers Lecture 24: Code Generation for Expressions

>>Okay, so — yeah, so

today we’ll be talking about simple code generation. Okay, so, so far, we have

covered parsing, so we know how to do parsing, and we know how to construct the abstract syntax

tree, and we talked about how to go from an abstract

syntax tree into actual code. So if we have an

example like this, A — we will focus on

expressions today, A times 3 plus B. The abstract

syntax tree for this is going to be like plus is equal to the,

you know, multiplication of A and 3, and then you

add this to B, so this is the abstract

syntax tree. Now, in terms of

code generation, we talked about this. We talked about how we can

generate code for this. Today we’re going to talk

about it in greater detail. You know, we will — we will

think about all the issues that are involved in generating

code for an expression like this, and in today’s

lecture, we will concentrate on generating code

for an expression. So in the next lectures, we

will look into code generation for conditionals like

if/else and loops, but today we will just

focus on expressions, how we can generate

code for expressions. So this case, you know,

the code that we want, this is a variable, so we would

like to load the variable, load A, and we want to

load 3, so we will load A into some register, like

R1, load immediate of 3 into some register, R2. Of course, loading A here is

loading a variable, which means, you know, a memory location,

and that consists of a base and an offset in general. So remember the instruction that

we used in the earlier lectures, load AI, load for address and

immediate, where the base is in a register, A or R. ARP,

what does ARP stand for?>>Activation record pointer.>>Yeah, activation

record pointer, exactly. So RA, activation record

pointer, and when the address of A, I’m going to put this

into some register, R1, loading immediate of 3, I’m

putting this in R2, then we want to multiply, what, R1 and R2,

and I’m going to put the result in some register, R3, then

here we want to load AIR, activation record pointer,

address of B into some register, like R4, and here we want to add

— we want to add what and what?>>R2 — R3 and R4.>>R3 and R4, and put

that in some register, R5. So now today’s lecture is about

the implementation details of this, not the implementation

details of the coding level, but algorithmically, because

in the programming assignment, you will have to write code that traverses the

abstract syntax tree and generates these

instructions. So we will look into this. We will write pseudo-code

for this. Basically, our objective here is to write pseudo-code

for doing this. So how do you think

that pseudo-code will — you know, what do you think that

pseudo-code would look like? What . . .>>It is for a while.>>Yeah, so it’s a tree

traversal, and we said what kind of traversal will serve

us well here, depth? Depth — and by the

way, why is it depth? Because in order to

do this multiply, we’ll have to calculate

everything beneath it, everything below the

multiply had to be calculated, so we have to go in depth. We can’t calculate the multiply until we have calculated

everything below it, and same applies to

every operation in this. So it has to be recursive. So what if we start, let’s define a function

expression generate or — yeah, for generating

an expression, and that will take a node,

so it’s going to take — so this is pseudo-code. It’s not actual code. So the parameter for this

is going to be a node. And then it’s going to switch

on the type of the node. So there will be a switch

on the type of the node. So here we are implying that

the node is a data structure. You can think of it as a

class or a struct node, and this node has

some fields in it. One field is the type, and

the type in this case is, for expressions, is going to be

— there are three types here. What are the three main types?>>Operator, variables,

and immediate value.>>Yeah, so it’s either

an arithmetic operation for what we are limiting

ourselves to at this point. It’s either an arithmetic

operation or a leaf, and the leaf is an operant, and this operant could

be either a variable or a number, an immediate

operant. So it’s either an

operation or an identifier, or let’s call it a variable

here in this context because identifier makes more

sense in the context of parsing, but here it’s a variable, so

“variable” is a better word in this context, or a number. Then we need the

kind of operation — what kind of operation? If it’s an operation, then

we need to specify the kind of operation, which is,

you know, add or subtract or divide or multiply. We need to know the operation. Then our other fields, now,

as we write the code, we will, you know, discover what

the fields that we need are and then we will add them as

we discover the need for them. Okay, so node type, so let’s

first do the easy ones. If it’s, you know, a case

variable, if it’s a variable, or in fact, the easiest

is the — let’s start with the

easiest, which is the number. So if it’s the number, then

all what we need to do is to issue a load immediate

instruction or to generate — sorry, generate here

in this context. We’re not issuing, we

are generating code. To generate a load immediate

instruction, so let’s assume that we have a function emit,

we have an emit function that takes an operation, source

1, source 2, and a destination, and it knows how to emit

the right instructions. So if we — if this operation

is load immediate, it will know that source 1 has to be

a number, an integer, and the destination

is a register. Source 2 is not used

because for a load immediate, we only have one source,

which is the immediate value, and we have a destination,

which is the register. For the load address immediate,

the load that loads, you know, based on the register,

a base and an offset. So source 1 is the base,

source 2 is the offset, and the destination

is the register that will receive the

result of the load. Okay, so here, you know, this is

source 1, this is destination, this is source 1, this is

source 2, this is destination. And for the add, you know,

this is our source 1, this is our source 2,

this is our destination. Okay, so we have this — we assume that we have

this clever emit function that will emit the right code. So in this case, we will

call it, if it’s a number, we will emit a load

immediate, load immediate, and the first source, so we

need to give it the number. So where will the

number come from? So, now, you are working

on your assignment, on the programming assignment, and you have constructed

the abstract syntax tree, but you need the number. So in this node, we

should also have a field that stores the value. You know, we are using a genetic

data structure, by the way. There are other ways of

designing this so that, you know, you can — for example, you can have a base

node class, and then you derive from this base node class

different derived classes for the different

types of nodes. You know, an operation

node is different from an identifier node is

different from a number node because each type of node is

different, and if you want to do this — if you want

to design this properly, you will have a base class for

the node that has everything that is common to all nodes,

and then in the derived classes, you will have whatever is

specific to that particular type of node, such as an operation

or an identifier or a number. But here we’re not going

to worry about this, so we’ll put every — all

the fields that we need, we’ll just put them in one

data structure for now. Okay –>>We need a value.>>We need a value, exactly. So when we create — now,

you should start thinking, when you are creating

this abstract syntax tree, then in this node, when

it’s the number node, you need to store

the value somewhere. So now we need this

load immediate. Our source 1 is going to be

node.value, source 2 is going to be null for the immediate,

for the load immediate, and our destination is going

to be a register number that will receive

the destination. Now, the question is: What

should that register number be? Look at this abstract syntax

tree and the node that — the code that we would

like to generate. So we have these

destination register numbers, so where do you think

these will come from? If you were writing the

code to implement this, where will the — you

know, here we put R1, here we put R2, R3, R4. So where do you think

these should come from, these register numbers?>>Also, when you have

infinite registers, you want to count track of

what you already allocated, and so you would

take that counter.>>Yeah, exactly, so exactly. All what you need is a

counter, and remember that at this point we are

doing only virtual registers, so when we are doing

virtual registers, we can have as many

registers as we want, so all what we need

is a counter. So just we need the function that gets the next

register number. So if we have a function

that get next register, this function is going to

return the next register number, so the first time you call

it it’s going to return one, the second time you call it,

it’s going to return two, then it’s going to

return three, and so on. So all what you need

is a counter because these are just

virtual registers. So we will just call this thing, so our T equals get

next register, and it will return a T, and we will pass it

here as a destination. And then we break for this case. So this is what we need to do in

order to generate an immediate. Now, to generate codes for

a variable, so what did — we called it “variable,” okay. For a variable, well,

again, since now we know that we need — we need

this emit, emit operation, and we need a destination,

now the base and the offset for this are going

to come from where? So to generate code for loading

a variable, we need the base and the offset, and where

do you think these will be? Because store them in the —

in this data structure as well, so we can store the

base and the offset. So, again, you know, we can

do T equals get next register, and we can do emit of

load address immediate. Source 1 is going to be

node.base, source 2 is going to be node.offset, and

the destination is going to be the T, and we break. So these are the leaf nodes. Now, what about the

in-tandem nodes which are operations,

case operation? Now, when the node

is in operation, now things are more

— we need to do more than just issuing an instruction

or generating an instruction. So now think about this. To generate code for this,

what do we need to do?>>[indiscernible] previous

registers that were allocated and then put it into

a new register.>>Okay, so we know how to

get the next register number, but we need to — before we

can calculate the multiply, we have to calculate

everything that is below it. So this has to be done and

this can be a huge tree. So in this example, we only

have two nodes, but in general, if you have an op like this,

the left child, you know, this can have a whole

tree below it, and this can have a whole

tree, generally speaking, for [indiscernible] expression. So what do you think the

right thing to do is here, because this can be any

complicated expression, and this can be a

complicated expression as well? So what should we do

here when, at this point, when we are generating

code for this? So what’s the right thing to do? Yes.>>You have to make

sure that each of the child trees

are evaluated first.>>And how do you evaluate them?>>You have to go

down [inaudible].>>Okay. So what should

I write in terms of code?>>Expression and put the left

node and then the right node.>>Yeah, exactly. So I should call

expression recursively. So this is recursive. So this is an expression

at this level. Before I can compute this, I

have to compute the expression for the left-hand side and

for the right-hand side and put each one in a

register, well, the register that receives the

result of this, okay? So in — okay, so I

have to call expression. Now, how do I get

to those nodes?>>Node.left, node.right.>>Yeah, exactly, so I have to store the left

and right as well. So we are discovering, you know,

the fields that we need to store in this node data structure,

and this is something that you will be doing

in the assignment. Okay, so expression of node.left

and we put this into — let’s call it T1 and T2. We’ll do expression node.right. Okay, now this means that whenever I compute

an expression, I have to return the, you

know, the destination, the destination register, right? So it’s — in every case, I have to return the destination

register so that the node, the parent node knows this. So expression for

A should return R1, so this will become

a return value from the left child

for this parent. So the parent looks at the

left and the right and each one of them should return,

so this means that this expression must return

the register that has the value, the destination of

this, so in this case, we will just return T.

So now our T has to be — has to have the destination

T equals get next register, so this is the T that will — or maybe we should

call it “result.” That’s going to be

more explicit. You can call it “T,” that’s

fine, but if we call it “result,” it will

be more explicit. Okay, I’m going to

call it “result.” It will be a more

descriptive name. So this is a result, a result, a

result, a result, and a result. And we will return the result

at the end, so return a result. Now, to make sure that

everybody understands this, we’re going to trace it. We’re going to go into

tracing after we are done. Yeah?>>You’re going to have to

initiate another switch state to handle all four

of your operators, or in this case,

the two for the –>>Okay, so why are

we doing this? We are assuming that this emit

function is intelligent enough, is smart enough to generate the

right code for the operation. In fact, it’s — the

only difference is going to be what that operation is. It’s add or multiply

or subtract. Everything else is

going to be the same. So we’re sending this

to the emit function, and the emit function, we give

it an operation, source 1, source 2, destination, and it

knows how to generate the code that we — that we want. Now, so we evaluate the

left, we evaluate the right, and then we get the

register for each one, and then we get the

next register. Oh, I missed the

most important thing. Well, I missed an important

thing here, which is what?>>When you do emit.>>Yeah, emitting,

so we have to emit. In this case, emit —

what’s the operation? The operation is whatever

this operation here, the operation.op. Yeah, so it’s — this

is the op field in this, the actual operation

name, or, you know, you can call it “opting.” So this is the operation. I send it to the emit, and my

source 1 is going to be T1, my source 2 is going to be T2,

and my destination is going to be the result, and

then I’m going to break. And at the end, I

return the result, okay? And here, you know, what are

these T1, T2, and result? In fact, these can be integers. You know, when you work

on this in the assignment, you will realize that these

register numbers can be just integers. Okay, so now let’s test

it on this example. So if we test it on this

example, we call the — we start traversing the

tree from the root, right? So we start from the

root in the beginning, so let’s put the stack. So first we call

expression on root, so expression on

plus, this node. Now, expression on plus

is going to switch. It’s going to go,

okay, this is an op, so it will recursively

call itself on node left, and node left is going to be

the multiply, so it’s going to call expression “multiply.” Now, an expression

multiply, it’s an op again. Now it’s going to call

the left, its own left, the left of op is going to

be this, expression of A. So we are going now

on the stack. What we have on the stack is

the leftmost staff [phonetic] in the tree, expression of A.

Now when we get to expression of A, it will identify

A as a variable and it will generate

this, you know, with register number 1

being — holding the result. This will be the first time

we call get next register. Get next register, and

this will give us R1 and it will generate this

instruction that we want. Now, when this returns,

when load A returns, we will be at this

point in which function? In which –>>The multiply.>>Yeah, the multiply, exactly. So we are tracing the recursion

here because it’s, you know, important to understand that

this whole thing is recursive, so when expression A returns,

it’s going to be at this point, and we are in expression

of multiply. Now, the right-hand side of

multiply is going to be the 3, so it’s going to

call expression of 3, and when we call expression

of 3 to load immediate, the 3 and this — it will

put it in register number 2. Now, and this will complete. Now, we are in expression of

multiply, so at this point, we are in expression of

multiply, and we are here, you know, we finished this. This returned R1, this

returned R2, and now it’s going to get the next register. The get next register

is going to give it R3. So this will generate this

multiply that we want. So it’s going to multiply R1

that came from the left child, that holds the result

of the left child, R2 that holds the result

of the right child, and R3, the new register that

we have just generated, so it will generate the,

you know, multiply R1, R2, and put the result in R3. Okay? Now, all of this

is in the multiply. Now, the multiply will

eventually return after this. Now it will return to

the add, and now we are, at this point, in the add. So now we are here, but

now we have returned to the add, the top

level, right? Now, the left child

returned, and the left child for the add returns what?>>R3.>>R3, yeah. So it will return R3. Now it will call its right

child, and the right child for this is the expression of

B, and expression of B is going to be the load immediate

and it’s going to return R4. Now, at the add level,

at this node’s level, get next register is

going to return what?>>R5.>>It’s going to return R5. It’s just a counter. So then we will emit that

instruction, add R3 and R4 and put the result in R5. Okay, so this is how it works. So let me ask some

questions now to make sure that everyone understands

how this works.>>Seems simple enough.>>Yeah, so it’s simple if you

— if you match the recursion. So if you are comfortable

with recursion, then this is, you know, this should be very

intuitive, and DFS is an example of a recursive algorithm. Now, okay, question: If I —

here, if I move this result, get next register,

if I move this here, would this work or not? Will it work correctly or not?>>No.>>No.>>Okay, I think it

will work correctly.>>Yeah, it will

still work correctly.>>But what will happen? What — what difference will . . .>>Yeah, it won’t be — it

won’t be in the order — the register allocations

won’t be in the order that we showed here

in the example. They’ll be different.>>Okay, so it will be different

register numbers will be used, but it will be correct. So how different will

the register numbers be? So what if — what will happen

to the register numbers? So let’s renumber the registers. If we just do this . . .>>It’ll put — well, the

first one, in the add, it will put it — instead of R5,

it’ll set the destination as R1.>>Exactly, so the

destination is going to be R1. So basically, it’s going

to do the allocation for the destination

in pre-order, not in post-order,

but that’s fine. These are just register numbers. So it’s going to set

the destination to R1 and then it will call the,

you know, the left child. The left child, the destination for the left child

is going to be R2.>>Right.>>Okay, and here –>>It’ll be R3.>>This will be R3. So now the left child for

the add is going to be R3, the right side is going

to be R4, all right, and then we will

return to the add, then this is going to be R5. So this will be R2 and R5

and put the result in R1. So it will be assigning register

numbers in pre-order instead of post-order, but

these are just numbers because these are

virtual registers anyway and it doesn’t matter. So this will still be correct. Now, what if we move

the emit here? So what if we do the emit? Well, not here, but,

yeah, the emit — or let’s move the emit

and the result, okay? So . . . So if we move the emit and the

result here before T1 and T2, before calling expression, so

if we do them in pre-order . . .>>Won’t work.>>Yeah, this won’t work. So basically, getting a register

number is not a big deal. We can do it before or after we

compute the expressions below the current node. That’s fine, it’s just a number. But calculating the actual

expressions below that and, you know, computing them must be

done before we can generate code for the current node

because this is dependent on these, okay? So the actual computation

has to happen in post-order, but the numbering of the

registers doesn’t matter. You can do it pre-order,

post-order, any order. Okay? Any questions on this? Yes?>>In fact, if you had put the

emit before the evaluations of T1 and T2, you’d be

using the previous registers that were used, T1 and T2, from whatever operation was

done prior to that, right?>>Yes, yeah, but it will

be — it will be wrong. Yeah, yeah, the T1 and T2, it will not even have

correct numbers, yeah. So it’s, yeah, so

it will not work.>>I was assuming T1

and T2 were global because it’s a recursive call

and those are local variables, so you would actually

be putting garbage into that function

at that point.>>Yes.>>Because it wouldn’t

be initialized. It would be whatever is in

those locations at that time.>>Yeah, but even if you

find a way of, you know, you can number them

any way you want. You can say, okay, I want

the result of this node to be in this register. So you can find the solution

for the numbering problem, but you will not

find the solution for the actual dependence

problem. Okay? Any other questions?