I'm writing a Scala parser for the following grammar:
expr := "<" anyString ">" "<" anyString ">"
anyString := // any string
For example, "<foo> <bar>" is a valid string, as is "<http://www.example.com/example> <123>", and "<1> <_hello>"
So far, I have the following:
object MyParser extends JavaTokenParsers {
override def skipWhitespace = false
def expr: Parser[Any] = "<" ~ anyString ~ ">" ~ whiteSpace ~ "<" ~ anyString ~ ">"
def anyString = ???
}
My questions are the following (I've included my suspected answer, but please confirm anyway, if I'm correct!):
How to implement a regex parser which accepts any string? This must have an almost trivial answer, like
def anyString = """\a*""".r, where\ais the symbol which represents any character (although\ais probably not the droid I'm looking for).If I set
anyStringto accept any string, will it stop before the>symbol or will it run until the end of the string and fail? I believe it will run until the end of the string and fail, and then it will eventually find the>and consume up to there. This seems to result in a very inefficient parser, and any comments on this would be appreciated!What if the string within
<and>contains a>symbol (e.g.<fo>o> <bar>)? WillanyStringconsume until the first>or the last one? Is there any way to specify whether it consumes the least it can, or the most?In order to fix the previous point, I'd like to forbid
<>inanyString. How to write that?.
Thank you!