How to get lookahead symbol when constructing LR(1) NFA for parser?
Posted
by
greenoldman
on Programmers
See other posts from Programmers
or by greenoldman
Published on 2012-11-29T18:20:12Z
Indexed on
2012/11/29
23:18 UTC
Read the original article
Hit count: 247
parser
I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.292 in the 2nd edition) about how to construct an LR(1) parser, and I am at the stage of building the initial NFA. What I don't understand is how to get/compute a lookahead symbol.
Here is the example from the book, the grammar:
S -> E
E -> E - T
E -> T
T -> ( E )
T -> n
n
is terminal. The "weird" transitions for me are is the sequence:
1) S -> . E eof
2) E -> . E - T eof
3) E -> . E - T -
4) E -> E . - T -
5) E -> E - . T -
(Note: In the above table, the state numbers are in front and the lookahead symbol is at the end.)
What puzzles me is that transition from (4) to (5) means reading -
token, right? So how is it that -
is still a lookahead symbol and even more important why is it that eof
is no longer a lookahead symbol? After all in an input such as n - n eof
there is only one -
symbol.
My naive thinking tells me (5) should be written as:
5) E -> E - . T - eof
And another thing -- n
is terminal. Why it is not used at all as a lookahead symbol? I mean -- we expect to see -
or (
, it is ok, but lack of n
means we are sure it won't appear in input?
Update: after more reading I am only more confused ;-) I.e. what is really a lookahead? Because I see such state as (p.292, 2nd column, 2nd row):
E -> E . - T eof
Lookahead says eof
but the incoming input says -
. Isn't it a contradiction? And it is not only in this book.
© Programmers or respective owner