I am trying to parse F# type syntax. I started writing an [F]Parsec grammar and ran into problems, so I simplified the grammar down to this:
type ::= identifier | type -> type
identifier ::= [A-Za-z0-9.`]+
After running into problems with FParsec, I switched to Parsec, since I have a full chapter of a book dedicated to explaining it. My code for this grammar is
typeP = choice [identP, arrowP]
identP = do
id <- many1 (digit <|> letter <|> char '.' <|> char '`')
-- more complicated code here later
return id
arrowP = do
domain <- typeP
string "->"
range <- typeP
return $ "("++domain++" -> "++range++")"
run = parse (do t <- typeP
eof
return t) "F# type syntax"
The problem is that Parsec doesn't backtrack by default, so
> run "int"
Right "int"
-- works!
> run "int->int"
Left "F# type syntax"
unexpected "-"
expecting digit, letter, ".", "`" or end of input
-- doesn't work!
The first thing I tried was to reorder typeP:
typeP = choice [arrowP, identP]
But this just stack overflows because the grammar is left-recursive--typeP never gets to trying identP because it keeps trying arrowP over and over. Next I tried try in various places, for example:
typeP = choice [try identP, arrowP]
But nothing I do seems to change the basic behaviours of (1) stack overflow or (2) non-recognition of "-" following an identifier.
My mistake is probably obvious to anybody who has successfully written a Parsec grammar. Can somebody point it out?