Threading extra state through a parser in Scala

Posted by Travis Brown on Stack Overflow See other posts from Stack Overflow or by Travis Brown
Published on 2012-09-19T02:37:48Z Indexed on 2012/09/19 3:38 UTC
Read the original article Hit count: 385

I'll give you the tl;dr up front

I'm trying to use the state monad transformer in Scalaz 7 to thread extra state through a parser, and I'm having trouble doing anything useful without writing a lot of t m a -> t m b versions of m a -> m b methods.

An example parsing problem

Suppose I have a string containing nested parentheses with digits inside them:

val input = "((617)((0)(32)))"

I also have a stream of fresh variable names (characters, in this case):

val names = Stream('a' to 'z': _*)

I want to pull a name off the top of the stream and assign it to each parenthetical expression as I parse it, and then map that name to a string representing the contents of the parentheses, with the nested parenthetical expressions (if any) replaced by their names.

To make this more concrete, here's what I'd want the output to look like for the example input above:

val target = Map(
  'a' -> "617",
  'b' -> "0",
  'c' -> "32",
  'd' -> "bc",
  'e' -> "ad"
)

There may be either a string of digits or arbitrarily many sub-expressions at a given level, but these two kinds of content won't be mixed in a single parenthetical expression.

To keep things simple, we'll assume that the stream of names will never contain either duplicates or digits, and that it will always contain enough names for our input.

Using parser combinators with a bit of mutable state

The example above is a slightly simplified version of the parsing problem in this Stack Overflow question. I answered that question with a solution that looked roughly like this:

import scala.util.parsing.combinator._

class ParenParser(names: Iterator[Char]) extends RegexParsers {
  def paren: Parser[List[(Char, String)]] = "(" ~> contents <~ ")" ^^ {
    case (s, m) => (names.next -> s) :: m
  }

  def contents: Parser[(String, List[(Char, String)])] = 
    "\\d+".r ^^ (_ -> Nil) | rep1(paren) ^^ (
      ps => ps.map(_.head._1).mkString -> ps.flatten
    )

  def parse(s: String) = parseAll(paren, s).map(_.toMap)
}

It's not too bad, but I'd prefer to avoid the mutable state.

What I want

Haskell's Parsec library makes adding user state to a parser trivially easy:

import Control.Applicative ((*>), (<$>), (<*))
import Data.Map (fromList)
import Text.Parsec

paren = do
  (s, m) <- char '(' *> contents <* char ')'
  h : t  <- getState
  putState t
  return $ (h, s) : m
  where
    contents
      =  flip (,) []
     <$> many1 digit
     <|> (\ps -> (map (fst . head) ps, concat ps))
     <$> many1 paren

main = print $
  runParser (fromList <$> paren) ['a'..'z'] "example" "((617)((0)(32)))"

This is a fairly straightforward translation of my Scala parser above, but without mutable state.

What I've tried

I'm trying to get as close to the Parsec solution as I can using Scalaz's state monad transformer, so instead of Parser[A] I'm working with StateT[Parser, Stream[Char], A]. I have a "solution" that allows me to write the following:

import scala.util.parsing.combinator._
import scalaz._, Scalaz._

object ParenParser extends ExtraStateParsers[Stream[Char]] with RegexParsers {
  protected implicit def monadInstance = parserMonad(this)

  def paren: ESP[List[(Char, String)]] = 
    (lift("(" ) ~> contents <~ lift(")")).flatMap {
      case (s, m) => get.flatMap(
        names => put(names.tail).map(_ => (names.head -> s) :: m)
      )
    }

  def contents: ESP[(String, List[(Char, String)])] =
    lift("\\d+".r ^^ (_ -> Nil)) | rep1(paren).map(
      ps => ps.map(_.head._1).mkString -> ps.flatten
    )

  def parse(s: String, names: Stream[Char]) =
    parseAll(paren.eval(names), s).map(_.toMap)
}

This works, and it's not that much less concise than either the mutable state version or the Parsec version.

But my ExtraStateParsers is ugly as sin—I don't want to try your patience more than I already have, so I won't include it here (although here's a link, if you really want it). I've had to write new versions of every Parser and Parsers method I use above for my ExtraStateParsers and ESP types (rep1, ~>, <~, and |, in case you're counting). If I had needed to use other combinators, I'd have had to write new state transformer-level versions of them as well.

Is there a cleaner way to do this? I'd love to see an example of a Scalaz 7's state monad transformer being used to thread state through a parser, but Scala 6 or Haskell examples would also be useful.

© Stack Overflow or respective owner

Related posts about scala

Related posts about haskell