# Compilers Lecture 21: Parsing (10): LR(1) Parsing, Part IV

>>Okay so, so today we finish up some issues related

to LR1 parsing. And in fact it happened that two of these issues are related

to midterm questions. Okay. So the first question,

you know, there was a question in the midterm about

the if statement. So you know, a statement could

be an assignment or an if. Okay. And, of course, we

know what the assignment is. The if is going to

have an optional else. So we have an if

with a condition. And then after the if, we

have some, let’s call it code. And there’s an else from code. And or you can have

an if without an else. So you can have a condition

with code and no else. So the else is optional. Of course, this code, so this

is one way of doing this. The code could be just

a single statement or a statement list

between braces. It’s just a single, of course

this is one way of doing it.>>Yeah but you also

have, according to the, if I recall correctly

from the midterm question, you could also have a single

statement inside braces too.>>Yes we haven’t

defined statement list. Because we can define statement

list to have a single statement. So statement list, in fact

you can define statement list to have possibly empty. So it can be, you can

recurs on statement, so it’s statement list, statement list statement

or epsilon. So this allows you to put as

many statements as you want and you can also

put no statement. So this means that between the

braces you can have an arbitrary number of statements. And of course, you know, I

graded only four questions in the midterm the

first four questions. I graded this questions and in fact only a few students

got this completely correct. The part that most

students missed is this. Most student grammars did not

allow a single statement to be, you know the, you know, the code after the X. Most

student missed this. An optional else, most

students did this right. So most students allowed the

if statement to have an else or not to have an else. The else was optional. But many students, you

know, missed this part. Anyway, so this is the

solution to the examples. And in fact this

is the new part. The thing that we did not cover

in class, the condition did it in the review session, you know. So the condition is exactly what

we did in the review session with that boolean and with the

precedence between and and or. So that was done in

the review session. And I was expecting 90% of

the students to get it right because it was done

in the review session. But the percentage of students who got it right was

not 90% it was less than 90% although it was

done in the review session. So it was exactly what we did in the review session

for that condition. Okay. So now what’s the point? Well this is for the test. But if else statements are

interesting when it comes to parsing because

a grammar with if L, this grammar is ambiguous. Why is it ambiguous? If we have something

like this, you know if, you know let’s say condition if some condition we have

statement 1, if condition do or. No it’s the example

that will show this. The condition if condition 1 if condition 2 then do statement

1, else do statement 2. Okay? So now let me just, to

simplify this, let’s assume that we have a single statement

just to get to the point. So let’s replace

this with statement. Let’s replace this code

with statement just to simplify the example

that we are about to study, just to focus on the point. So if you have something

like this if condition 1, if condition 2 do statement

one else statement 2. Now where’s the ambiguity here?>>Where’s the else attach is

is the first if or the second?>>Yes exactly. So is the else attached to the

first if or to the second if. So there are 2 derivations

of this to parts of derivations, you know. Ambiguity means that

you can come up with 2 possible derivations. So in the first derivation. By the way, the language that you are familiar

with what do they do? Which one is, will

the else get attached to the first if or

the second if?>>Yeah it’s the second if.>>Yeah and are most ifs. So what we are familiar

with is this. Okay. So this is going

to be a statement. So that, so 1 derivation

is this, you know, statement is an if,

of course, condition. And I’m going to drop the

parenthesis just for simplicity. Okay? If condition

then statement. And this statement, sorry, I also drop the, so

let me put the if. So it’s statement

is an if and if is if condition, I can’t do this. I don’t want it, I don’t

want it to get messy. If condition statement then

this statement is another if. And this if is if condition then

statement and else statement. Okay? Else. And so this is one possible

parse where this is condition 1. So this condition is

going to be condition 1. This is condition 2. And this is statement 1

and this is statement 2. The other possible parse, another possible parse

is statement is an if and the if is. What should we substitute

for if now?>>If conditional

statement else.>>Yes exactly. So at the top level

we have an if else. So if, if condition then

statement else statement. Okay? And. How do we complete

this derivation?>>A statement in the

if gets another if?>>Yeah so this statement

here is this. This is a statement now. So this if condition statement,

this statement is an if. And this if is if

condition statement. Okay? So now this is condition

1, this is condition 2, and this is statement 1

and this is statement 2. So now here, the else is

associated or is an else for the inner most if. Here the else is

for the outer if. Okay? So two possible

derivations. So this is the if else

the if else ambiguity. Now how does this, how does this

reflection to parse or parsing? So in parsing, when you are

doing, let’s do, you know, with LR1 parsing, when

we are doing LR1 parsing, at some point, this

is our, you know, with this grammar,

with the placeholder. You know the placeholder

is going to start here. And then our placeholder’s

going to move. And then it will

get to this point. So the placeholder’s

going to move, right. The LR1 parsing will have a

placeholder that keeps moving. The placeholder will, at some

point, will get at this point. So you will have

something, you will get in some state you will get if is

an if condition code else code. And you will get if, if

condition code without else. And at some point, the placeholder will

get to this point. If we what have on the stack

is this, if this appears on the stack, this is

what we have on the stack. Then the placeholder

will get to this point. Now by the way, what’s

the lookahead? You know, the lookahead

should be.>>Well in the first one it’ll

be, the lookahead will be else, the second one should

be end of line.>>Not necessarily. In fact, what can come

after an if in this grammar? In this grammar, what

can come after an if?>>Parenthesis.>>Hum?>>Parenthesis is the condition.>>Well first of all, if here,

this is the start variable. So after if you can

always have an end of file because if statement can always

be the last state according to this grammar. You know, this is a

grammar that says, okay what you are passing

is a single statement. Your start variable

is a statement. So if you are passing

the single statement and this statement is either

an assignment or an if. So an if can have

end of file, okay. But since the if, because

this can be an if too. So when we do the expansion,

when we did the expansion in state, in state 0, the if

here, because this can be an if, the if can be followed by else. So generally speaking, an if

statement can either be followed by an else or it can be

followed by an end of file. So in fact, your

lookahead here is going to be else or end of file. And here is going to

be else or end of file. So basically, you know, if an

if statement is followed by end of file if it’s the last if

it’s a complete statement. And it’s followed by

an if, if it’s this if. You know, if we can, if this

is an if then this whole thing, this is an if then it’s going

to be followed by an else. Now, when we construct

the table, when we construct the

LR1 table for this. So this is one of the states. This will appear at some state. Of course it’s not going

to be the start state. We’ll not get to this, we’re

not showing all states here. We will get to this when the

placeholder moves, you know, past all of these and

gets to this point. But this is the point where

we will hit a problem. Because at this point,

do we have a reduction?>>Yes.>>Yeah we have a reduction. So this is our reduction. So we can reduce by rule number,

we didn’t give it a number but let’s say 0, 1, 2. So we can reduce by rule

number 2 which is this. But do we have a shift?>>Yes.>>Yeah we also have a shift. So the problem here is that,

when you get to this point and, you now, if what you are

seeing in the file is an else, if your lookahead is an else,

you can reduce or you can shift. So this is what we call.>>Shift reduce error.>>Shift reduce error. So there are two possibilities

and this is just not good. So this is a shift reduce error. So it’s a shift reduce error

because we can shift or reduce. And in fact if we shift, we will

get one of these derivations and if we reduce we

will get another. If you think about it, you know, shifting will give

us this or this. So will shifting give us the

first one or the second one.>>Shift will give

you the first one.>>Yeah if we shift we’ll

get the first one because we, if we shift, we are waiting. We’re not reducing

until we get here. So if we shift it means that

we’re not reducing at his point. We will wait until a later state

until the placeholder gets here. And then what we’re

going to reduce and our reduction here is going

to reduce this to a statement. So if we shift now and

reduce later, shift now and reduce later we’ll

get this reduction. This will become an if or a

statement which is a statement. And then this will give

us this derivation. So shifting will give us this

derivation while reducing can, will give us the

other derivation. Reducing it means

reduce immediately. And when you reduce immediately

you will get this derivation. So this is a shift reduce error. Now how is this handled

in practice. Now when we have shift reduce

error, strictly speaking, this means that this grammar

is not an LR1 grammar. It means that it cannot,

strictly speaking, it cannot be parsed using

the LR1 parsing algorithm. But in fact, if we can

just choose between shift and reduce, we can proceed. So it’s a matter of

choosing whether we want to shift or reduce. And, in this case,

if we choose reduce, we will get the expected

behavior or we will get the

expected derivation. So you will be using, you

know, parser generator program, you’d be using Bison

for parser generate, to generate the parser. And when you have an if else it

will write a grammar for if else to tell you that there

is a shift reduce error. And the default is to

resolve the conflict for, in favor of a shift, to do a

shift rather than the reduce. So this is the default behavior. In fact, I think it even, if you

disable warnings it’s not going to give you a warning. It will just, you know,

do the default resolution of the conflict which is

resolve the conflict in favor of a shift rather than a reduce. Okay? So this is, you

know, just to be aware of the shift reduce error

and it’s, a typical example for this is the if

else statement. Any questions on this?>>I have one.>>Okay.>>On the second

one why have else? Because that’s not,

that second one’s not, that if isn’t followed

by an else.>>Well in fact we don’t know, we can’t tell the lookaheads

by looking at these. The lookaheads are

determined in, we’re not sure with the whole LR1

construction here. So the lookaheads come from,

first from the initial state. So when you, it comes

from the top. So in order to answer

this completely, I have to show the whole

LR1, you know, state, all the LR1 canonical

collections. And there are like 14 or

15 states in this case. Okay? Yes.>>Say that ambiguity in

general does not make the grammar unparsible?>>Yeah. So in general, yeah,

you can sometimes find a way to get around ambiguity

or to handle ambiguity.>>Okay>>So ambiguous grammar

is not necessarily bad. And it’s, you know, ambiguous

grammar is not necessarily unparsible because they, you

now, all languages have if else. And, you know, people write

grammars and build parsers for them and everything

works fine. Okay. So this is one question. The other question

that I hope I will get to finish is the LR1 parsing. So the LR1 parsing question. So the go is A and

A is AA or A or A. So is this a right recursive

or a left recursive grammar.>>Right.>>This is right recursive. Because many people

were still, in the exam, many people were confused. Some people in the LR1 parsing

question, some people tried to remove left recursion

from a grammar that doesn’t have

left recursion. It does not have left recursion. So and some people are trying

to remove left recursion. So left recursion is

when you have, you know, this is the left

recursive version of this grammar that

we did in class. Go A and A is AA and A is A. So

this is what we did in class. We did the left recursive

version in class and in the test I asked you to

do the right recursive version. The right recursive version

is totally different. The solution is going

to be totally different but it is still small. You know, it is still a

manageable number of states because it’s a small grammar. So the number we have

only these productions. This is 0, 1, 2. And again, you know the

initial state as 0 is going to have go A. What’s

the lookahead? It’s always end of file. Then we expand A, we

expand for A. So a is AA end of file or AA end of file. And we are at the

beginning of an A. So, this is our first state. Now is there any

further expansion? No further expansion. Why? Because this

is the non-terminal. So this is the non-terminal

so we just stop. This is our start state. So we have, obviously 2

transitions one on big A and one on little a. So let’s do it

systematically the way we did it in class. The transition on big A

is going to give us this. The transition on big A is going to give us go big A

placeholder end of file. Is there any expansion

further expansion of this? No further expansion. But this is a reduction. So this is a reduction

which is an accept syntax. Now the transition on little

a, trying to save some space, transition on small

a will give us, again the placeholder moves a

a.a end of file aa end of file. We just moved the place forward. Now do we have any expansion? Yes. A huge number of people

missed this expansion. So we have an expansion on

a. This is a non-terminal. So we expand on A. And expanding

on A is going to give us A, AA and A goes to A. So

we have to expand this. Now what’s the lookahead?>>End of file.>>Again, the lookahead

is what follows the A. So what comes after, here’s

an A that we are expanding. We are expanding this. So this is an expansion

of this like this is going to get substituted for this. What comes after

this is end of file. What comes after this.>>Is an A.>>Is end of, no it’s not

an A. It’s an expansion for A. So what’s.>>Okay.>>So it’s an expansion. That’s why we call it expansion. So A is replaced with this. So what comes after the

A is an end of file. And this is an end of file. Okay? So this is the expansion. Now any further expansion? No because this is a terminal. So this is what we get. So this is state

1, this is state 2. Of course, this will not

lead to any further state. Will this lead to

any further states?>>Yeah.>>Yeah. In fact we have a,

we have a transition on big A and on little a. On big A if you

have a big A then this is going to be A, AA placeholder

end of file. Anything else? No because we only move

it past the A. So this is where the placeholder

proceeds an A. Any expansion, no expansion because the

placeholder is at the end. So this is our state. And we call this state number 3. And do we have a reduction?>>Yeah.>>Yes we do have a reduction. So this is a reduction by.>>R1.>>Rule number 1. Now what if we have

a small a here. So if we have a small a,

we will get a goes to. Well here, so these are

the ones that are going to move past the small a.

So it’s going to give us a, a.a end of file and

a is a. end of file. Is there an expansion? Yeah. So it’s going to

expand the same thing to a, a end of file a.a end of file. But this state is

the same as state 2. So it’s not a new state. So basically we have a

loop into the same state. It’s not a new state.>>How did you count the AA

again, sorry I missed that.>>Which one?>>On the third line from

the last thing we just did.>>The third line?>>Yeah.>>By expanding this. So always, you know, we said

whenever you have a state, the first question is,

is there an expansion? Second question is,

is there a reduction? So we did this. Is there an expansion? The answer is yes because

this is a non-terminal. So, we expand the A.>>Yeah I think what he means

is if you wrote it wrong.>>Yeah.>>It’s transposed big A and

little a on the serving line.>>You switch them

for part of the rule.>>Oh, that’s.>>Okay. Okay. So it’s going to

be the same as this yeah. Sorry. Yeah. So this is an expansion of A.

Oh sorry, so that was my fault. Okay. So you’ll get

the same state. So the always ask yourself

whether there’s an expansion or not. And here you will

get the same state. So this is a look back

into state number 2. Okay? Now, so we

just don’t need this. So now the goto is going to be, the goto table and

the action table. So let’s put the goto here.>>We’re missing

the reduction now.>>Where’s the reduction?>>Yeah. We’ll find it.>>It’s the kind of

reduction for rule 2.>>S2.>>It’s in state 2.>>Yeah we’ll find it,

we’ll catch it yeah. So we have a, we

have a reduction here for on end of file. And this R2. Okay. So let’s do the goto. The goto from state 0,

state 1, state 2, state 3. So there are only 4 states. So the only reductions

that we have on non-terminals,

we have a reduction. Sorry transitions, a transition

from S0 to S1 on an A, S0 to S1. And we have a transition

from S2 to S3 on an A. And that’s it. This is the goto. Now our action table. The action table’s

going to look like this. This is S0, S1, S2, S3. And this is an end of file

and this is an A. And again, S0 there are no reductions

but there is a shift on an A. So if you are on an A, you

shift and you go to S2. If you are in S1, there is a

reduction if the end of file, if the lookahead is end of file. So you accept on end of file,

no transition out of S1. S2 there is a reduction

R2 on end of file, end of file you do R2. And there is a transition or an

A. The A leads to the same state so shift and stay

in the same state with an A. Now S3 there is a

reduction if the lookahead is in the file you reduce

by rule number 1 and there is no transition. So this is our action table. Oh let’s, okay. Yeah let’s do the stack

because the whole point here is the stack. So the stack is going

to be first S0. And S0 and the lookahead

and here’s the lookahead. So we have AAA. So S0 on an A asks

you to shift by 2. So if you do a shift 2 and

shift 2 gets you S0 AS2. And now we have shifted. Again, I am in S2, I’m seeing

an A. So it keeps telling you that if you are in S2 and

you’re seeing an A shift. S0 A, S2 shift again, S2. Okay. And you tell

it again I’m in, so we shifted twice

so we are here now. Now again S2 and A tells

you to shift again. And now you have S0

A, I think, you have, let me give myself

more room on this. Because now we have

S0 A, S2 A, S2 A, S2. We shifted 3 times. Now after shifting

3 times we are here and we are seeing end of file. So after shifting 3

times, we shift 3 A’s. Okay. Now you tell, you

look up the action table. I am in S2 and I am seeing an, when I was seeing

an A I was shifting. Now when I’m seeing end of file, it tells you to reduce

by rule 2. So now you reduce

by rule 2 and R2. And an R2 is going to do this. How many symbols do you pop off?>>2.>>Rule number 2, no. Rule number 2 has 1 symbol.>>Oh with the 2 times the.>>Yeah so, yeah, yeah

okay, so yeah exactly. So you put, yeah, so

there’s only 1 symbol so you pop off 2, so

you pop off these. The A with the state number. So you get S0, A, S2,

A. Well this is an S2. So now what’s our base

state for the goto? Okay. Base state for the

goto is whatever we have after we pop off. So after we pop off

our base state is S2. So we need goto of S2

which is our base state after popping off, and

the symbol that we get, which is an A. And this is

S2 and A, this is an S3. So this means that we go to S3. This is an A and this is S3. I think this is the most

important step in the, the most important step in

tracing the parsing algorithm on this, on the stack. So again you pop off, your base

state is S2 goto of S2 and A. And what’s the rational here? The rational is that you

go to your base state because you have replaced the

A. You replaced the A with an A. So now we are looking

for the position on big A because you replaced

the A with a big A. So that gets you

to state number 3. Now you are in state number 3

and you are seeing end of file, it tells you to reduce

by rule number 1. So reduce by rule number 1. Now how many do we pop off?>>4.>>We pop off 4 yeah. S0, A, S2. So we pop off this 1, 2, 3, 4. Again this is our base

state which is S2. So goto of S2 and the symbol

that we get which is again and A. And this is going

to take us to S3 again. So we put the A and

we go to S3 again. Now it’s going to tell us the

same thing because we are in S3 and we are seeing end of file. So it’s gong to say

reduce by 1 again. So reduce by 1 again

is going to do this. This is S0. Reduce by 1 so we pop off 4. So again our base state now is>>S0.>>S0, yeah. After you pop off these

4, your base state is 0. So now we need goto of S0 and

A because we will get an A. And goto of S0 and A is S1. So we put an A and we go to S1. Now the action table we are

in S1 and we are seeing end of file it says now accept. Okay. So now we accept S0 and

this is the go and accept. Okay. So in fact, you know,

this is the exam question. I don’t think it’s harder than

the one that we did in class but it’s just different. So it’s different. And, if you really

understand the procedure, you shouldn’t have

a problem with this. But, if you’re having problem

then you don’t understand the procedure very well.>>I think what kept screwing

everyone else up is the part that it kept looping back on state 2 is I think is

what screwed everybody up.>>But, but there was a homework

question in which there is a, you know, you encountered

the same state again and you don’t create

a new state. So there was a question

on purpose. There was a question

in the homework on the classic expression

grammar where you encounter a state

that is not a new state, okay, and you don’t, you know, when you encounter the same

state you don’t create a new state. So yeah. So there was a

homework question the classic expression grammar. How many people went through

the homework, you know, after I posted the solutions to

the homework, homework 2 and 3? How many people went through the

solutions before the midterm? Are you sure? Well I was expecting everyone

to go through the solutions. And in fact even to, you know,

watch the lectures again. Because it’s, you know,

we’ll talk about LL1 because on LL1 the performance

was even worse than LR1 which was against

the expectations. But let me finish

this point first. Because here there is a

point we are trying to make about left recursion

versus right recursion. So the one that we did in

class was go A then A is AA and A is little a. So, we

did the left recursive one. So now what do you think, do

you see anything weird here or inefficient with this? Yeah.>>So with the one

we just did it built, it kept shifting all

the way to the end.>>Yeah.>>And then it did reductions.>>Yes exactly. So if we had, instead of 3 A’s, if we had a million A’s

then it would have shifted a million times. And it put a million

symbols, in fact, 2 million symbols on the stack. So the point here that we are

trying to make is that with when you apply LR1 parsing to our right recursive

grammar like this.>>You have the same

problem with the.>>Yeah.>>With the left

recursive grammar.>>Yeah. Yeah. Left recursive grammar,

LL1 has its problems with left recursive grammar. In fact you cannot do

left recursive at all. So, LL1 parsing cannot do left

recursive grammars at all. LL1 parsing can do both. Can do left recursive

and right recursive. But, with right recursive ones,

with the right recursive ones, it has this, you

know, stack problem. It may do a lot of shifting and it may make the

stack grow really big. You can have a really

big stack when you do, when you apply LR1 parsing to a right recursive

grammar, this the point. Yeah.>>So if you have a known

right recursive grammar with a solution then need to

read it from right to left?>>What do you mean known?>>Like if you’re building

an LR1 parser and you know that the grammar is a

right recursive grammar, could you solve this problem

with pushing 2 million things on the stack by having

the parser read the string from right to left?>>Well this is, you know,

this is an algorithm. So you have to, this

is an algorithm, the LR1 parsing algorithm. So this is, you know,

we just followed the algorithm systematically. And this is what it gave us. So, you know, remember

always that the algorithm is as intelligent as you

can make it, you know. You make the algorithm. So in, you know, this

algorithm is just, algorithm, don’t expect an algorithm

to be like human beings. Oh, okay, let me see,

oh this is not good so let me change myself. The algorithm doesn’t

change itself. So you design an algorithm. And then if an algorithm

does not perform efficiently on some input, you

have to figure out how to modify the algorithm. Well, in this case, you just, the conclusion here is

LR1 passing works better with left recursive grammars. So let me just stick to the left

recursive version of the grammar because LR1 parsing

works better on these. Okay. So in this case of

course there’s no difference between the left recursive

and the right recursive. They are equivalent. They are two equivalent

grammars. But the LR1 parsing algorithm

generates totally different, a totally different table and it passes the

string totally different. Okay. Any other questions? Okay. So you know, I

wanted to talk about. So this is the LR1 parsing. I’m still grading

the LR1 parsing. I’m not done grading the, you

know the LR1 parsing question. I’m still in the middle.