7
Oct

Compilers Lecture 21: Parsing (10): LR(1) Parsing, Part IV

>>Okay so, so today we finish up some issues related
to LR1 parsing. And in fact it happened that two of these issues are related
to midterm questions. Okay. So the first question,
you know, there was a question in the midterm about
the if statement. So you know, a statement could
be an assignment or an if. Okay. And, of course, we
know what the assignment is. The if is going to
have an optional else. So we have an if
with a condition. And then after the if, we
have some, let’s call it code. And there’s an else from code. And or you can have
an if without an else. So you can have a condition
with code and no else. So the else is optional. Of course, this code, so this
is one way of doing this. The code could be just
a single statement or a statement list
between braces. It’s just a single, of course
this is one way of doing it.>>Yeah but you also
have, according to the, if I recall correctly
from the midterm question, you could also have a single
statement inside braces too.>>Yes we haven’t
defined statement list. Because we can define statement
list to have a single statement. So statement list, in fact
you can define statement list to have possibly empty. So it can be, you can
recurs on statement, so it’s statement list, statement list statement
or epsilon. So this allows you to put as
many statements as you want and you can also
put no statement. So this means that between the
braces you can have an arbitrary number of statements. And of course, you know, I
graded only four questions in the midterm the
first four questions. I graded this questions and in fact only a few students
got this completely correct. The part that most
students missed is this. Most student grammars did not
allow a single statement to be, you know the, you know, the code after the X. Most
student missed this. An optional else, most
students did this right. So most students allowed the
if statement to have an else or not to have an else. The else was optional. But many students, you
know, missed this part. Anyway, so this is the
solution to the examples. And in fact this
is the new part. The thing that we did not cover
in class, the condition did it in the review session, you know. So the condition is exactly what
we did in the review session with that boolean and with the
precedence between and and or. So that was done in
the review session. And I was expecting 90% of
the students to get it right because it was done
in the review session. But the percentage of students who got it right was
not 90% it was less than 90% although it was
done in the review session. So it was exactly what we did in the review session
for that condition. Okay. So now what’s the point? Well this is for the test. But if else statements are
interesting when it comes to parsing because
a grammar with if L, this grammar is ambiguous. Why is it ambiguous? If we have something
like this, you know if, you know let’s say condition if some condition we have
statement 1, if condition do or. No it’s the example
that will show this. The condition if condition 1 if condition 2 then do statement
1, else do statement 2. Okay? So now let me just, to
simplify this, let’s assume that we have a single statement
just to get to the point. So let’s replace
this with statement. Let’s replace this code
with statement just to simplify the example
that we are about to study, just to focus on the point. So if you have something
like this if condition 1, if condition 2 do statement
one else statement 2. Now where’s the ambiguity here?>>Where’s the else attach is
is the first if or the second?>>Yes exactly. So is the else attached to the
first if or to the second if. So there are 2 derivations
of this to parts of derivations, you know. Ambiguity means that
you can come up with 2 possible derivations. So in the first derivation. By the way, the language that you are familiar
with what do they do? Which one is, will
the else get attached to the first if or
the second if?>>Yeah it’s the second if.>>Yeah and are most ifs. So what we are familiar
with is this. Okay. So this is going
to be a statement. So that, so 1 derivation
is this, you know, statement is an if,
of course, condition. And I’m going to drop the
parenthesis just for simplicity. Okay? If condition
then statement. And this statement, sorry, I also drop the, so
let me put the if. So it’s statement
is an if and if is if condition, I can’t do this. I don’t want it, I don’t
want it to get messy. If condition statement then
this statement is another if. And this if is if condition then
statement and else statement. Okay? Else. And so this is one possible
parse where this is condition 1. So this condition is
going to be condition 1. This is condition 2. And this is statement 1
and this is statement 2. The other possible parse, another possible parse
is statement is an if and the if is. What should we substitute
for if now?>>If conditional
statement else.>>Yes exactly. So at the top level
we have an if else. So if, if condition then
statement else statement. Okay? And. How do we complete
this derivation?>>A statement in the
if gets another if?>>Yeah so this statement
here is this. This is a statement now. So this if condition statement,
this statement is an if. And this if is if
condition statement. Okay? So now this is condition
1, this is condition 2, and this is statement 1
and this is statement 2. So now here, the else is
associated or is an else for the inner most if. Here the else is
for the outer if. Okay? So two possible
derivations. So this is the if else
the if else ambiguity. Now how does this, how does this
reflection to parse or parsing? So in parsing, when you are
doing, let’s do, you know, with LR1 parsing, when
we are doing LR1 parsing, at some point, this
is our, you know, with this grammar,
with the placeholder. You know the placeholder
is going to start here. And then our placeholder’s
going to move. And then it will
get to this point. So the placeholder’s
going to move, right. The LR1 parsing will have a
placeholder that keeps moving. The placeholder will, at some
point, will get at this point. So you will have
something, you will get in some state you will get if is
an if condition code else code. And you will get if, if
condition code without else. And at some point, the placeholder will
get to this point. If we what have on the stack
is this, if this appears on the stack, this is
what we have on the stack. Then the placeholder
will get to this point. Now by the way, what’s
should be.>>Well in the first one it’ll
be, the lookahead will be else, the second one should
be end of line.>>Not necessarily. In fact, what can come
after an if in this grammar? In this grammar, what
can come after an if?>>Parenthesis.>>Hum?>>Parenthesis is the condition.>>Well first of all, if here,
this is the start variable. So after if you can
always have an end of file because if statement can always
be the last state according to this grammar. You know, this is a
grammar that says, okay what you are passing
is a single statement. Your start variable
is a statement. So if you are passing
the single statement and this statement is either
an assignment or an if. So an if can have
end of file, okay. But since the if, because
this can be an if too. So when we do the expansion,
when we did the expansion in state, in state 0, the if
here, because this can be an if, the if can be followed by else. So generally speaking, an if
statement can either be followed by an else or it can be
followed by an end of file. So in fact, your
lookahead here is going to be else or end of file. And here is going to
be else or end of file. So basically, you know, if an
if statement is followed by end of file if it’s the last if
it’s a complete statement. And it’s followed by
an if, if it’s this if. You know, if we can, if this
is an if then this whole thing, this is an if then it’s going
to be followed by an else. Now, when we construct
the table, when we construct the
LR1 table for this. So this is one of the states. This will appear at some state. Of course it’s not going
to be the start state. We’ll not get to this, we’re
not showing all states here. We will get to this when the
placeholder moves, you know, past all of these and
gets to this point. But this is the point where
we will hit a problem. Because at this point,
do we have a reduction?>>Yes.>>Yeah we have a reduction. So this is our reduction. So we can reduce by rule number,
we didn’t give it a number but let’s say 0, 1, 2. So we can reduce by rule
number 2 which is this. But do we have a shift?>>Yes.>>Yeah we also have a shift. So the problem here is that,
when you get to this point and, you now, if what you are
seeing in the file is an else, if your lookahead is an else,
you can reduce or you can shift. So this is what we call.>>Shift reduce error.>>Shift reduce error. So there are two possibilities
and this is just not good. So this is a shift reduce error. So it’s a shift reduce error
because we can shift or reduce. And in fact if we shift, we will
get one of these derivations and if we reduce we
will get another. If you think about it, you know, shifting will give
us this or this. So will shifting give us the
first one or the second one.>>Shift will give
you the first one.>>Yeah if we shift we’ll
get the first one because we, if we shift, we are waiting. We’re not reducing
until we get here. So if we shift it means that
we’re not reducing at his point. We will wait until a later state
until the placeholder gets here. And then what we’re
going to reduce and our reduction here is going
to reduce this to a statement. So if we shift now and
reduce later, shift now and reduce later we’ll
get this reduction. This will become an if or a
statement which is a statement. And then this will give
us this derivation. So shifting will give us this
derivation while reducing can, will give us the
other derivation. Reducing it means
reduce immediately. And when you reduce immediately
you will get this derivation. So this is a shift reduce error. Now how is this handled
in practice. Now when we have shift reduce
error, strictly speaking, this means that this grammar
is not an LR1 grammar. It means that it cannot,
strictly speaking, it cannot be parsed using
the LR1 parsing algorithm. But in fact, if we can
just choose between shift and reduce, we can proceed. So it’s a matter of
choosing whether we want to shift or reduce. And, in this case,
if we choose reduce, we will get the expected
behavior or we will get the
expected derivation. So you will be using, you
know, parser generator program, you’d be using Bison
for parser generate, to generate the parser. And when you have an if else it
will write a grammar for if else to tell you that there
is a shift reduce error. And the default is to
resolve the conflict for, in favor of a shift, to do a
shift rather than the reduce. So this is the default behavior. In fact, I think it even, if you
disable warnings it’s not going to give you a warning. It will just, you know,
do the default resolution of the conflict which is
resolve the conflict in favor of a shift rather than a reduce. Okay? So this is, you
know, just to be aware of the shift reduce error
and it’s, a typical example for this is the if
else statement. Any questions on this?>>I have one.>>Okay.>>On the second
one why have else? Because that’s not,
that second one’s not, that if isn’t followed
by an else.>>Well in fact we don’t know, we can’t tell the lookaheads
by looking at these. The lookaheads are
determined in, we’re not sure with the whole LR1
construction here. So the lookaheads come from,
first from the initial state. So when you, it comes
from the top. So in order to answer
this completely, I have to show the whole
LR1, you know, state, all the LR1 canonical
collections. And there are like 14 or
15 states in this case. Okay? Yes.>>Say that ambiguity in
general does not make the grammar unparsible?>>Yeah. So in general, yeah,
you can sometimes find a way to get around ambiguity
or to handle ambiguity.>>Okay>>So ambiguous grammar
is not necessarily bad. And it’s, you know, ambiguous
grammar is not necessarily unparsible because they, you
now, all languages have if else. And, you know, people write
grammars and build parsers for them and everything
works fine. Okay. So this is one question. The other question
that I hope I will get to finish is the LR1 parsing. So the LR1 parsing question. So the go is A and
A is AA or A or A. So is this a right recursive
or a left recursive grammar.>>Right.>>This is right recursive. Because many people
were still, in the exam, many people were confused. Some people in the LR1 parsing
question, some people tried to remove left recursion
from a grammar that doesn’t have
left recursion. It does not have left recursion. So and some people are trying
to remove left recursion. So left recursion is
when you have, you know, this is the left
recursive version of this grammar that
we did in class. Go A and A is AA and A is A. So
this is what we did in class. We did the left recursive
version in class and in the test I asked you to
do the right recursive version. The right recursive version
is totally different. The solution is going
to be totally different but it is still small. You know, it is still a
manageable number of states because it’s a small grammar. So the number we have
only these productions. This is 0, 1, 2. And again, you know the
initial state as 0 is going to have go A. What’s
the lookahead? It’s always end of file. Then we expand A, we
expand for A. So a is AA end of file or AA end of file. And we are at the
beginning of an A. So, this is our first state. Now is there any
further expansion? No further expansion. Why? Because this
is the non-terminal. So this is the non-terminal
so we just stop. This is our start state. So we have, obviously 2
transitions one on big A and one on little a. So let’s do it
systematically the way we did it in class. The transition on big A
is going to give us this. The transition on big A is going to give us go big A
placeholder end of file. Is there any expansion
further expansion of this? No further expansion. But this is a reduction. So this is a reduction
which is an accept syntax. Now the transition on little
a, trying to save some space, transition on small
a will give us, again the placeholder moves a
a.a end of file aa end of file. We just moved the place forward. Now do we have any expansion? Yes. A huge number of people
missed this expansion. So we have an expansion on
a. This is a non-terminal. So we expand on A. And expanding
on A is going to give us A, AA and A goes to A. So
we have to expand this. Now what’s the lookahead?>>End of file.>>Again, the lookahead
is what follows the A. So what comes after, here’s
an A that we are expanding. We are expanding this. So this is an expansion
of this like this is going to get substituted for this. What comes after
this is end of file. What comes after this.>>Is an A.>>Is end of, no it’s not
an A. It’s an expansion for A. So what’s.>>Okay.>>So it’s an expansion. That’s why we call it expansion. So A is replaced with this. So what comes after the
A is an end of file. And this is an end of file. Okay? So this is the expansion. Now any further expansion? No because this is a terminal. So this is what we get. So this is state
1, this is state 2. Of course, this will not
lead to any further state. Will this lead to
any further states?>>Yeah.>>Yeah. In fact we have a,
we have a transition on big A and on little a. On big A if you
have a big A then this is going to be A, AA placeholder
end of file. Anything else? No because we only move
it past the A. So this is where the placeholder
proceeds an A. Any expansion, no expansion because the
placeholder is at the end. So this is our state. And we call this state number 3. And do we have a reduction?>>Yeah.>>Yes we do have a reduction. So this is a reduction by.>>R1.>>Rule number 1. Now what if we have
a small a here. So if we have a small a,
we will get a goes to. Well here, so these are
the ones that are going to move past the small a.
So it’s going to give us a, a.a end of file and
a is a. end of file. Is there an expansion? Yeah. So it’s going to
expand the same thing to a, a end of file a.a end of file. But this state is
the same as state 2. So it’s not a new state. So basically we have a
loop into the same state. It’s not a new state.>>How did you count the AA
again, sorry I missed that.>>Which one?>>On the third line from
the last thing we just did.>>The third line?>>Yeah.>>By expanding this. So always, you know, we said
whenever you have a state, the first question is,
is there an expansion? Second question is,
is there a reduction? So we did this. Is there an expansion? The answer is yes because
this is a non-terminal. So, we expand the A.>>Yeah I think what he means
is if you wrote it wrong.>>Yeah.>>It’s transposed big A and
little a on the serving line.>>You switch them
for part of the rule.>>Oh, that’s.>>Okay. Okay. So it’s going to
be the same as this yeah. Sorry. Yeah. So this is an expansion of A.
Oh sorry, so that was my fault. Okay. So you’ll get
the same state. So the always ask yourself
whether there’s an expansion or not. And here you will
get the same state. So this is a look back
into state number 2. Okay? Now, so we
just don’t need this. So now the goto is going to be, the goto table and
the action table. So let’s put the goto here.>>We’re missing
the reduction now.>>Where’s the reduction?>>Yeah. We’ll find it.>>It’s the kind of
reduction for rule 2.>>S2.>>It’s in state 2.>>Yeah we’ll find it,
we’ll catch it yeah. So we have a, we
have a reduction here for on end of file. And this R2. Okay. So let’s do the goto. The goto from state 0,
state 1, state 2, state 3. So there are only 4 states. So the only reductions
that we have on non-terminals,
we have a reduction. Sorry transitions, a transition
from S0 to S1 on an A, S0 to S1. And we have a transition
from S2 to S3 on an A. And that’s it. This is the goto. Now our action table. The action table’s
going to look like this. This is S0, S1, S2, S3. And this is an end of file
and this is an A. And again, S0 there are no reductions
but there is a shift on an A. So if you are on an A, you
shift and you go to S2. If you are in S1, there is a
reduction if the end of file, if the lookahead is end of file. So you accept on end of file,
no transition out of S1. S2 there is a reduction
R2 on end of file, end of file you do R2. And there is a transition or an
A. The A leads to the same state so shift and stay
in the same state with an A. Now S3 there is a
reduction if the lookahead is in the file you reduce
by rule number 1 and there is no transition. So this is our action table. Oh let’s, okay. Yeah let’s do the stack
because the whole point here is the stack. So the stack is going
to be first S0. And S0 and the lookahead
and here’s the lookahead. So we have AAA. So S0 on an A asks
you to shift by 2. So if you do a shift 2 and
shift 2 gets you S0 AS2. And now we have shifted. Again, I am in S2, I’m seeing
an A. So it keeps telling you that if you are in S2 and
you’re seeing an A shift. S0 A, S2 shift again, S2. Okay. And you tell
it again I’m in, so we shifted twice
so we are here now. Now again S2 and A tells
you to shift again. And now you have S0
A, I think, you have, let me give myself
more room on this. Because now we have
S0 A, S2 A, S2 A, S2. We shifted 3 times. Now after shifting
3 times we are here and we are seeing end of file. So after shifting 3
times, we shift 3 A’s. Okay. Now you tell, you
look up the action table. I am in S2 and I am seeing an, when I was seeing
an A I was shifting. Now when I’m seeing end of file, it tells you to reduce
by rule 2. So now you reduce
by rule 2 and R2. And an R2 is going to do this. How many symbols do you pop off?>>2.>>Rule number 2, no. Rule number 2 has 1 symbol.>>Oh with the 2 times the.>>Yeah so, yeah, yeah
okay, so yeah exactly. So you put, yeah, so
there’s only 1 symbol so you pop off 2, so
you pop off these. The A with the state number. So you get S0, A, S2,
A. Well this is an S2. So now what’s our base
state for the goto? Okay. Base state for the
goto is whatever we have after we pop off. So after we pop off
our base state is S2. So we need goto of S2
which is our base state after popping off, and
the symbol that we get, which is an A. And this is
S2 and A, this is an S3. So this means that we go to S3. This is an A and this is S3. I think this is the most
important step in the, the most important step in
tracing the parsing algorithm on this, on the stack. So again you pop off, your base
state is S2 goto of S2 and A. And what’s the rational here? The rational is that you
go to your base state because you have replaced the
A. You replaced the A with an A. So now we are looking
for the position on big A because you replaced
the A with a big A. So that gets you
to state number 3. Now you are in state number 3
and you are seeing end of file, it tells you to reduce
by rule number 1. So reduce by rule number 1. Now how many do we pop off?>>4.>>We pop off 4 yeah. S0, A, S2. So we pop off this 1, 2, 3, 4. Again this is our base
state which is S2. So goto of S2 and the symbol
that we get which is again and A. And this is going
to take us to S3 again. So we put the A and
we go to S3 again. Now it’s going to tell us the
same thing because we are in S3 and we are seeing end of file. So it’s gong to say
reduce by 1 again. So reduce by 1 again
is going to do this. This is S0. Reduce by 1 so we pop off 4. So again our base state now is>>S0.>>S0, yeah. After you pop off these
4, your base state is 0. So now we need goto of S0 and
A because we will get an A. And goto of S0 and A is S1. So we put an A and we go to S1. Now the action table we are
in S1 and we are seeing end of file it says now accept. Okay. So now we accept S0 and
this is the go and accept. Okay. So in fact, you know,
this is the exam question. I don’t think it’s harder than
the one that we did in class but it’s just different. So it’s different. And, if you really
understand the procedure, you shouldn’t have
a problem with this. But, if you’re having problem
then you don’t understand the procedure very well.>>I think what kept screwing
everyone else up is the part that it kept looping back on state 2 is I think is
what screwed everybody up.>>But, but there was a homework
question in which there is a, you know, you encountered
the same state again and you don’t create
a new state. So there was a question
on purpose. There was a question
in the homework on the classic expression
grammar where you encounter a state
that is not a new state, okay, and you don’t, you know, when you encounter the same
state you don’t create a new state. So yeah. So there was a
homework question the classic expression grammar. How many people went through
the homework, you know, after I posted the solutions to
the homework, homework 2 and 3? How many people went through the
solutions before the midterm? Are you sure? Well I was expecting everyone
to go through the solutions. And in fact even to, you know,
watch the lectures again. Because it’s, you know,
we’ll talk about LL1 because on LL1 the performance
was even worse than LR1 which was against
the expectations. But let me finish
this point first. Because here there is a
point we are trying to make about left recursion
versus right recursion. So the one that we did in
class was go A then A is AA and A is little a. So, we
did the left recursive one. So now what do you think, do
you see anything weird here or inefficient with this? Yeah.>>So with the one
we just did it built, it kept shifting all
the way to the end.>>Yeah.>>And then it did reductions.>>Yes exactly. So if we had, instead of 3 A’s, if we had a million A’s
then it would have shifted a million times. And it put a million
symbols, in fact, 2 million symbols on the stack. So the point here that we are
trying to make is that with when you apply LR1 parsing to our right recursive
grammar like this.>>You have the same
problem with the.>>Yeah.>>With the left
recursive grammar.>>Yeah. Yeah. Left recursive grammar,
LL1 has its problems with left recursive grammar. In fact you cannot do
left recursive at all. So, LL1 parsing cannot do left
recursive grammars at all. LL1 parsing can do both. Can do left recursive
and right recursive. But, with right recursive ones,
with the right recursive ones, it has this, you
know, stack problem. It may do a lot of shifting and it may make the
stack grow really big. You can have a really
big stack when you do, when you apply LR1 parsing to a right recursive
grammar, this the point. Yeah.>>So if you have a known
right recursive grammar with a solution then need to
read it from right to left?>>What do you mean known?>>Like if you’re building
an LR1 parser and you know that the grammar is a
right recursive grammar, could you solve this problem
with pushing 2 million things on the stack by having
the parser read the string from right to left?>>Well this is, you know,
this is an algorithm. So you have to, this
is an algorithm, the LR1 parsing algorithm. So this is, you know,
we just followed the algorithm systematically. And this is what it gave us. So, you know, remember
always that the algorithm is as intelligent as you
can make it, you know. You make the algorithm. So in, you know, this
algorithm is just, algorithm, don’t expect an algorithm
to be like human beings. Oh, okay, let me see,
oh this is not good so let me change myself. The algorithm doesn’t
change itself. So you design an algorithm. And then if an algorithm
does not perform efficiently on some input, you
have to figure out how to modify the algorithm. Well, in this case, you just, the conclusion here is
LR1 passing works better with left recursive grammars. So let me just stick to the left
recursive version of the grammar because LR1 parsing
works better on these. Okay. So in this case of
course there’s no difference between the left recursive
and the right recursive. They are equivalent. They are two equivalent
grammars. But the LR1 parsing algorithm
generates totally different, a totally different table and it passes the
string totally different. Okay. Any other questions? Okay. So you know, I
wanted to talk about. So this is the LR1 parsing. I’m still grading
the LR1 parsing. I’m not done grading the, you
know the LR1 parsing question. I’m still in the middle.