I am reading an explanation (awesome "Parsing Techniques" by D.Grune and C.J.H.Jacobs; p.292 in the 2nd edition) about how to construct an LR(1) parser, and I am at the stage of building the initial NFA. What I don't understand is how to get/compute a lookahead symbol.
Here is the example from the book, the grammar:
S -> E
E -> E - T
E -> T
T -> ( E )
T -> n
n is terminal. The "weird" transitions for me are is the sequence:
1) S -> . E eof
2) E -> . E - T eof
3) E -> . E - T -
4) E -> E . - T -
5) E -> E - . T -
(Note: In the above table, the state numbers are in front and the lookahead symbol is at the end.)
What puzzles me is that transition from (4) to (5) means reading - token, right? So how is it that - is still a lookahead symbol and even more important why is it that eof is no longer a lookahead symbol? After all in an input such as n - n eof there is only one - symbol.
My naive thinking tells me (5) should be written as:
5) E -> E - . T - eof
And another thing -- n is terminal. Why it is not used at all as a lookahead symbol? I mean -- we expect to see - or (, it is ok, but lack of n means we are sure it won't appear in input?
Update: after more reading I am only more confused ;-) I.e. what is really a lookahead? Because I see such state as (p.292, 2nd column, 2nd row):
E -> E . - T eof
Lookahead says eof but the incoming input says -. Isn't it a contradiction? And it is not only in this book.