Recognizing terminals in a CFG production previously not defined as tokens.

Posted by kmels on Stack Overflow See other posts from Stack Overflow or by kmels
Published on 2010-05-30T08:39:01Z Indexed on 2010/05/30 8:42 UTC
Read the original article Hit count: 307

I'm making a generator of LL(1) parsers, my input is a CoCo/R language specification. I've already got a Scanner generator for that input. Suppose I've got the following specification:

COMPILER 1.

CHARACTERS

digit="0123456789".

TOKENS
number = digit{digit}. 
decnumber = digit{digit}"."digit{digit}.

PRODUCTIONS

Expression = Term{"+"Term|"-"Term}.      
Term = Factor{"*"Factor|"/"Factor}.       
Factor = ["-"](Number|"("Expression")").
Number = (number|decnumber).

END 1.

So, if the parser generated by this grammar receives a word "1+1", it'd be accepted i.e. a parse tree would be found.

My question is, the character "+" was never defined in a token, but it appears in the non-terminal "Expression". How should my generated Scanner recognize it? It would not recognize it as a token.

Is this a valid input then? Should I add this terminal in TOKENS and then consider an error routine for a Scanner for it to skip it?

How does usual language specifications handle this?

© Stack Overflow or respective owner

Related posts about compiler

Related posts about code-generation