Scala parser combinators converting list of characters to strings

Question

I have been trying to get my head around Scala's parser combinators. It seems that they are pretty powerful but the only tutorial examples I seem to find are with mathematical expressions and very little proper real-world parsing examples with DSLs that need to be parsed and mapped to different entities etc.

For the sake of this example, lets say I have this BNF where I have this entity named Model, which is made up of a string like this: [model [name <name> ]]. This is a simplistic example of a much larger BNF I have and there are more entities in reality.

So I defined my own class Model which takes the name as the constructor and then defined my own ModelParser object which extends JavaTokenParsers. I then defined the following parsers, following the BNF (I know some may have a simpler regex matcher but I preferred to follow the BNF exactly for other reasons).

def model : Parser[Model] = "[model" ~> "[name" ~> name <~ "]]" ^^ ( Model(_) )
def name : Parser[String] = (letter ~ (anyChar*)) ^^ {case text => text.toString())
def anyChar = letter | digit | "_".r | "-".r
def letter = """[a-zA-Z]""".r
def digit = """\d""".r

The toString of Model looks like this:

override def toString : String = "[model " + name + "]"

When I try to run it with a string like [model [name helloWorld]] I get this [model [h~List(e, l, l, o, W, o, r, l, d)]] instead of what I am expecting [model helloWorld]

How do I get those individual characters to join back in the string they were originally in?

I am also confused with the individual parsers and the use of .r. Sometimes I saw examples where they had just the following as a parser (to parse "hello"):

def hello = "hello"

Isn't that just a String? How on Earth did it suddenly become a parser that can be combined with other parsers? And what is the .r actually doing? I have read at least 3 tutorials but still totally lost what is actually happening.

jbx · Accepted Answer · 2013-12-13 10:20:50Z

3

The problem is that anyChar* parses a List[String] (where in this case each string is a single character), and the result of calling toString on a list of strings is "List(...)", not the string you'd get by concatenating the contents. In addition, the case text => pattern is matching on the entire letter ~ (anyChar*), not just the anyChar* part.

It's possible to address both of these issues pretty straightforwardly:

case class Model(name: String) {
  override def toString : String = "[model " + name + "]"
}

import scala.util.parsing.combinator._

object ModelParser extends RegexParsers {
  def model: Parser[Model] = "[model" ~> "[name" ~> name <~ "]]" ^^ (Model(_))

  def name: Parser[String] = letter ~ (anyChar*) ^^ {
    case first ~ rest => (first :: rest).mkString
  }

  def anyChar = letter | digit | "_".r | "-".r
  def letter = """[a-zA-Z]""".r
  def digit = """\d""".r
}

We just append the first character string to the list of the rest, and then call mkString on the entire list, which will concatenate the contents. This works as expected:

scala> ModelParser.parseAll(ModelParser.model, "[model [name helloWorld]]")
res0: ModelParser.ParseResult[Model] = [1.26] parsed: [model helloWorld]

As you note, it would be possible (and possibly clearer and more performant) to let the regular expressions do more of the work:

object ModelParser extends RegexParsers {
  def model: Parser[Model] = "[model" ~> "[name" ~> name <~ "]]" ^^ (Model(_))

  def name: Parser[String] = """[a-zA-Z\d_-]+""".r
}

This example also illustrates the way that the parsing combinator library uses implicit conversions to cut down on some of the verbosity of writing parsers. As you say, def hello = "hello" defines a string, and "[a-zA-Z]+".r defines a Regex (via the r method on StringOps), but either can be used as a parser because RegexParsers defines implicit conversions from String (this one's named literal) and Regex (regex) to Parser[String].

edited Dec 13, 2013 at 10:20

jbx

22.3k20 gold badges100 silver badges151 bronze badges

answered Dec 13, 2013 at 7:34

Travis Brown

139k12 gold badges384 silver badges689 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jbx Over a year ago

Thanks a lot for your clarifications, especially the .r confusion and the implicit conversion between a String literal to Parser[String]. The name parser is working fine now!

jbx Over a year ago

@Travis I noticed that for some reason, even [model [name hello World]] is being accepted and reproduced after parsing as helloWorld just the same. How do I force it to not accept the name part if it has a whitespace? The ~ seems to allow it just fine. I don't want to disable it completely for the parser because it is quite useful.

Collectives™ on Stack Overflow

Scala parser combinators converting list of characters to strings

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related