Recognizing terminals in a CFG production previously not defined as tokens.
Posted
by kmels
on Stack Overflow
See other posts from Stack Overflow
or by kmels
Published on 2010-05-30T08:39:01Z
Indexed on
2010/05/30
8:42 UTC
Read the original article
Hit count: 307
compiler
|code-generation
|computer-science
|context-free-grammar
|parser-generator
I'm making a generator of LL(1) parsers, my input is a CoCo/R language specification. I've already got a Scanner generator for that input. Suppose I've got the following specification:
COMPILER 1.
CHARACTERS
digit="0123456789".
TOKENS
number = digit{digit}.
decnumber = digit{digit}"."digit{digit}.
PRODUCTIONS
Expression = Term{"+"Term|"-"Term}.
Term = Factor{"*"Factor|"/"Factor}.
Factor = ["-"](Number|"("Expression")").
Number = (number|decnumber).
END 1.
So, if the parser generated by this grammar receives a word "1+1", it'd be accepted i.e. a parse tree would be found.
My question is, the character "+" was never defined in a token, but it appears in the non-terminal "Expression". How should my generated Scanner recognize it? It would not recognize it as a token.
Is this a valid input then? Should I add this terminal in TOKENS and then consider an error routine for a Scanner for it to skip it?
How does usual language specifications handle this?
© Stack Overflow or respective owner