How to get lookahead symbol when constructing LR(1) NFA for parser?

Posted by greenoldman on Programmers See other posts from Programmers or by greenoldman
Published on 2012-11-29T18:20:12Z Indexed on 2012/11/29 23:18 UTC
Read the original article Hit count: 250

Filed under:

I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.292 in the 2nd edition) about how to construct an LR(1) parser, and I am at the stage of building the initial NFA. What I don't understand is how to get/compute a lookahead symbol.

Here is the example from the book, the grammar:

S -> E
E -> E - T
E -> T
T -> ( E )
T -> n

n is terminal. The "weird" transitions for me are is the sequence:

1)   S -> . E        eof
2)   E -> . E - T    eof
3)   E -> . E - T    -
4)   E -> E . - T    -
5)   E -> E - . T    -

(Note: In the above table, the state numbers are in front and the lookahead symbol is at the end.)

What puzzles me is that transition from (4) to (5) means reading - token, right? So how is it that - is still a lookahead symbol and even more important why is it that eof is no longer a lookahead symbol? After all in an input such as n - n eof there is only one - symbol.

My naive thinking tells me (5) should be written as:

5)   E -> E - . T    - eof

And another thing -- n is terminal. Why it is not used at all as a lookahead symbol? I mean -- we expect to see - or (, it is ok, but lack of n means we are sure it won't appear in input?

Update: after more reading I am only more confused ;-) I.e. what is really a lookahead? Because I see such state as (p.292, 2nd column, 2nd row):

E -> E . - T      eof

Lookahead says eof but the incoming input says -. Isn't it a contradiction? And it is not only in this book.

© Programmers or respective owner

Related posts about parser