1

I have the following template:

#foo(args)# // START CONTAINER1
  #foo(foo <- foos)(args)# // BLOCK STARTS HERE (`args` can be on either side of `block`)
     #bar(args)# // START CONTAINER2
     #.bar# // END CONTAINER2
  #.foo# // END BLOCK
#.foo# // END CONTAINER1

*notice how #.foo# closes each container/block

The trouble I see here is that there's no unique id of some sort to represent each block so I have to keep track of how many container openers/closers there are (#foo#/#.foo#) so that a block with an inside container's END CONTAINER hash won't confuse the parser as ending the block.

How would I use Scala's parsers to parse blocks in a language like this?


I started off with this:

def maybeBlockMaybeJustContainer:Content = {
  (openingHash ~ identifier ~ opt(args) ~> opt(blockName) <~ opt(args) ~ closingHash) ~ 
      opt(content) ~
  openHash ~ dot ~ identifier ~ closingHash ^^ ...
}

I'm also thinking about preprocessing it but not sure where to start.

8
  • Well, what does your data structure look like? Commented Apr 30, 2014 at 9:21
  • @Kigyo just edited it in Commented Apr 30, 2014 at 13:11
  • openingHash and closingHash should be the same. Why not just hash? Commented Apr 30, 2014 at 13:43
  • so I have to keep track of how many container openers/closers there are. Parsers for context-free languages do this for free. This is the main difference between context-free and regular parsers. Commented Apr 30, 2014 at 13:52
  • @ggovan yes, correct. I tried to make the code as clear as possible Commented Apr 30, 2014 at 14:01

1 Answer 1

2

For your language constuct something similar to BNF in the form

//Each of these is of type Parser (or String, which will be implicity converted to Parser when needed).
lazy val container = containerHeader ~ containerBody ~ containerEnd
lazy val containerHeader = hash ~ identifier ~ opt(args) ~ hash
lazy val containerBody = rep(block)
....
lazy val identifier = regex(new Regex("[a-zA-Z0-9-]+"))
lazy val hash = "#"

If your parser accepts a string, then the string in a member of the language defined by this parser.

This is a parser for a context-free language. Context free languages include those of the form a[x]Sb[x] where [x] indicates that the previous symbol has be exist x times, where x is undefined by the grammar, but rather is different for each string. (If x were defined for the grammar, then the language would be finite, and all finite languages are regular.)

This means that the language allows for nesting, or recursive components, such as your blocks and containers.

If you start parsing a container, then a block inside that container, you will not finish parsing the contain until the block has been fully parsed. This is true for all strings in your language.

Once you have you grammar defined and it is properly accepting and rejecting test cases then you can work on hooking it up to your AST.

lazy val identifier:Parser[Identifier] = regex(new Regex("[a-zA-Z0-9-]+")) ^^ {case s => Identifier(s)}

Note how this now has the type Parser[Identifier], i.e. its a parser that if parses correctly will return an Identifier. This is used in more complex cases as

lazy val container:Parser[Container] = containerHeader ~ containerBody ~ containerEnd ^^ {case head ~ body ~ end => Container(head.identifier,body)}

Let me know if any of this needs expanded upon.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.