Parsing a language using scala parser combinators

Question

I have the following template:

#foo(args)# // START CONTAINER1
  #foo(foo <- foos)(args)# // BLOCK STARTS HERE (`args` can be on either side of `block`)
     #bar(args)# // START CONTAINER2
     #.bar# // END CONTAINER2
  #.foo# // END BLOCK
#.foo# // END CONTAINER1

*notice how #.foo# closes each container/block

The trouble I see here is that there's no unique id of some sort to represent each block so I have to keep track of how many container openers/closers there are (#foo#/#.foo#) so that a block with an inside container's END CONTAINER hash won't confuse the parser as ending the block.

How would I use Scala's parsers to parse blocks in a language like this?

I started off with this:

def maybeBlockMaybeJustContainer:Content = {
  (openingHash ~ identifier ~ opt(args) ~> opt(blockName) <~ opt(args) ~ closingHash) ~ 
      opt(content) ~
  openHash ~ dot ~ identifier ~ closingHash ^^ ...
}

I'm also thinking about preprocessing it but not sure where to start.

openingHash and closingHash should be the same. Why not just hash? — ggovan
– ggovan, Commented Apr 30, 2014 at 13:43
so I have to keep track of how many container openers/closers there are. Parsers for context-free languages do this for free. This is the main difference between context-free and regular parsers. — ggovan
– ggovan, Commented Apr 30, 2014 at 13:52
@ggovan yes, correct. I tried to make the code as clear as possible — goo
– goo, Commented Apr 30, 2014 at 14:01

ggovan · Accepted Answer · 2014-04-30 14:49:45Z

For your language constuct something similar to BNF in the form

//Each of these is of type Parser (or String, which will be implicity converted to Parser when needed).
lazy val container = containerHeader ~ containerBody ~ containerEnd
lazy val containerHeader = hash ~ identifier ~ opt(args) ~ hash
lazy val containerBody = rep(block)
....
lazy val identifier = regex(new Regex("[a-zA-Z0-9-]+"))
lazy val hash = "#"

If your parser accepts a string, then the string in a member of the language defined by this parser.

This is a parser for a context-free language. Context free languages include those of the form a[x]Sb[x] where [x] indicates that the previous symbol has be exist x times, where x is undefined by the grammar, but rather is different for each string. (If x were defined for the grammar, then the language would be finite, and all finite languages are regular.)

This means that the language allows for nesting, or recursive components, such as your blocks and containers.

If you start parsing a container, then a block inside that container, you will not finish parsing the contain until the block has been fully parsed. This is true for all strings in your language.

Once you have you grammar defined and it is properly accepting and rejecting test cases then you can work on hooking it up to your AST.

lazy val identifier:Parser[Identifier] = regex(new Regex("[a-zA-Z0-9-]+")) ^^ {case s => Identifier(s)}

Note how this now has the type Parser[Identifier], i.e. its a parser that if parses correctly will return an Identifier. This is used in more complex cases as

lazy val container:Parser[Container] = containerHeader ~ containerBody ~ containerEnd ^^ {case head ~ body ~ end => Container(head.identifier,body)}

Let me know if any of this needs expanded upon.

Collectives™ on Stack Overflow

Parsing a language using scala parser combinators

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related