Threading extra state through a parser in Scala
- by Travis Brown
I'll give you the tl;dr up front
I'm trying to use the state monad transformer in Scalaz 7 to thread extra state through a parser, and I'm having trouble doing anything useful without writing a lot of t m a -> t m b versions of m a -> m b methods.
An example parsing problem
Suppose I have a string containing nested parentheses with digits inside them:
val input = "((617)((0)(32)))"
I also have a stream of fresh variable names (characters, in this case):
val names = Stream('a' to 'z': _*)
I want to pull a name off the top of the stream and assign it to each parenthetical
expression as I parse it, and then map that name to a string representing the
contents of the parentheses, with the nested parenthetical expressions (if any) replaced by their
names.
To make this more concrete, here's what I'd want the output to look like for the example input above:
val target = Map(
'a' -> "617",
'b' -> "0",
'c' -> "32",
'd' -> "bc",
'e' -> "ad"
)
There may be either a string of digits or arbitrarily many sub-expressions at a given level, but these two kinds of content won't be mixed in a single parenthetical expression.
To keep things simple, we'll assume that the stream of names will never
contain either duplicates or digits, and that it will always contain enough
names for our input.
Using parser combinators with a bit of mutable state
The example above is a slightly simplified version of the parsing problem in
this Stack Overflow question.
I answered that question with
a solution that looked roughly like this:
import scala.util.parsing.combinator._
class ParenParser(names: Iterator[Char]) extends RegexParsers {
def paren: Parser[List[(Char, String)]] = "(" ~> contents <~ ")" ^^ {
case (s, m) => (names.next -> s) :: m
}
def contents: Parser[(String, List[(Char, String)])] =
"\\d+".r ^^ (_ -> Nil) | rep1(paren) ^^ (
ps => ps.map(_.head._1).mkString -> ps.flatten
)
def parse(s: String) = parseAll(paren, s).map(_.toMap)
}
It's not too bad, but I'd prefer to avoid the mutable state.
What I want
Haskell's Parsec library makes
adding user state to a parser trivially easy:
import Control.Applicative ((*>), (<$>), (<*))
import Data.Map (fromList)
import Text.Parsec
paren = do
(s, m) <- char '(' *> contents <* char ')'
h : t <- getState
putState t
return $ (h, s) : m
where
contents
= flip (,) []
<$> many1 digit
<|> (\ps -> (map (fst . head) ps, concat ps))
<$> many1 paren
main = print $
runParser (fromList <$> paren) ['a'..'z'] "example" "((617)((0)(32)))"
This is a fairly straightforward translation of my Scala parser above, but without mutable state.
What I've tried
I'm trying to get as close to the Parsec solution as I can using Scalaz's state monad transformer, so instead of Parser[A] I'm working with StateT[Parser, Stream[Char], A].
I have a "solution" that allows me to write the following:
import scala.util.parsing.combinator._
import scalaz._, Scalaz._
object ParenParser extends ExtraStateParsers[Stream[Char]] with RegexParsers {
protected implicit def monadInstance = parserMonad(this)
def paren: ESP[List[(Char, String)]] =
(lift("(" ) ~> contents <~ lift(")")).flatMap {
case (s, m) => get.flatMap(
names => put(names.tail).map(_ => (names.head -> s) :: m)
)
}
def contents: ESP[(String, List[(Char, String)])] =
lift("\\d+".r ^^ (_ -> Nil)) | rep1(paren).map(
ps => ps.map(_.head._1).mkString -> ps.flatten
)
def parse(s: String, names: Stream[Char]) =
parseAll(paren.eval(names), s).map(_.toMap)
}
This works, and it's not that much less concise than either the mutable state version or the Parsec version.
But my ExtraStateParsers is ugly as sin—I don't want to try your patience more than I already have, so I won't include it here (although here's a link, if you really want it). I've had to write new versions of every Parser and Parsers method I use above
for my ExtraStateParsers and ESP types (rep1, ~>, <~, and |, in case you're counting). If I had needed to use other combinators, I'd have had to write new state transformer-level versions of them as well.
Is there a cleaner way to do this? I'd love to see an example of a Scalaz 7's state monad transformer being used to thread state through a parser, but Scala 6 or Haskell examples would also be useful.