A parser which accepts any string in Scala?

Question

I'm writing a Scala parser for the following grammar:

expr := "<" anyString ">" "<" anyString ">"
anyString := // any string

For example, "<foo> <bar>" is a valid string, as is "<http://www.example.com/example> <123>", and "<1> <_hello>"

So far, I have the following:

object MyParser extends JavaTokenParsers {

  override def skipWhitespace = false

  def expr: Parser[Any] = "<" ~ anyString ~ ">" ~ whiteSpace ~ "<" ~ anyString ~ ">"

  def anyString = ???

}

My questions are the following (I've included my suspected answer, but please confirm anyway, if I'm correct!):

How to implement a regex parser which accepts any string? This must have an almost trivial answer, like def anyString = """\a*""".r, where \a is the symbol which represents any character (although \a is probably not the droid I'm looking for).
If I set anyString to accept any string, will it stop before the > symbol or will it run until the end of the string and fail? I believe it will run until the end of the string and fail, and then it will eventually find the > and consume up to there. This seems to result in a very inefficient parser, and any comments on this would be appreciated!
What if the string within < and > contains a > symbol (e.g. <fo>o> <bar>)? Will anyString consume until the first > or the last one? Is there any way to specify whether it consumes the least it can, or the most?
In order to fix the previous point, I'd like to forbid < > in anyString. How to write that?.

Thank you!

gdiazc · Accepted Answer · 2014-02-28 13:25:37Z

1

I'm currently researching my own question, and I'll try to answer myself here.

The Java Pattern documentation specifies that . matches any character. Therefore, the regex which accepts any string would be:
```
def anyString = ".*".r
```
To accept any non-empty string, we can use ".+".r.
To understand this, consider the following toy example:
```
 object MyParser1 {
   override def skipWhitespace = false
   def expr = "<" ~ anyString ~ ">"
   def anyString = ".*".r
 }
```
Here, the string <> is rejected. To test this, use:
```
println(  MyParser1.parseAll(MyParser1.expr, "<>")  )
```
This indicates that the .* parser is consuming until the end of the string, whereby the > is not available for the final parser. Therefore, it seems to be necessary to forbid < and > form appearing in anyString.
As in the previous point, the .* parser consumes the whole string, and therefore consumes all > symbols.
In the same documentation, a negation operator is given. To exclude < and >, we can write:
```
def almostAnyString = "[^<>]*".r
```
In general, the construct [^abc] will match any character except a, b, and c.

To conclude, the best implementation I've found so far is the following:

object MyParser extends JavaTokenParsers {
  override def skipWhitespace = false // don't allow whitespace between parsers by default

  def expr: Parser[Any] = "<" ~ almostAnyString ~ ">" ~
                          whiteSpace ~ // this parser is defined in JavaTokenParsers
                          "<" ~ almostAnyString ~ ">"

  def almostAnyString = "[^<>]*".r

}

edited Feb 28, 2014 at 13:25

answered Feb 28, 2014 at 13:02

gdiazc

2,1485 gold badges19 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michał Politowski Over a year ago

You don't really need to forbid <, do you? The question is, do you need <a < b> < c <- d > to be accepted or not?

Collectives™ on Stack Overflow

A parser which accepts any string in Scala?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related