13

I have written a parser as follows:

class LogParser extends JavaTokenParsers {

  def invertedIndex: Parser[Array[Array[(Int, Int)]]] = {
    num ~> num ~> num ~> rep(postingsList) ^^ {
      _.toArray
    }
  }

  def postingsList: Parser[Array[(Int, Int)]] = {
    num ~> rep(entry) ^^ {
      _.toArray
    }
  }

  def entry = {
    num ~ "," ~ num ^^ {
      case docID ~ "," ~ count => (docID.toInt, count.toInt)
    }
  }

  def num = wholeNumber ^^ (_.toInt)

}

If I parse from a (270MB) file with a FileReader as follows:

val index = parseAll(invertedIndex, new FileReader("path/to/file")).get

I get an Exception in thread "main" java.lang.StackOverflowError (I have also tried wrapping in a BufferedReader) but I can fix it by first reading the file into a String like so:

val input = io.Source.fromFile("path/to/file")
val str = input.mkString
input.close()
val index = parseAll(invertedIndex, str).get

Why is this the case? Is there any way to avoid reading it as a String first, it seems a waste?

5
  • 2
    What's the current size of your stack, and how much larger do you have to make your stack to avoid the StackOverflowException? How much smaller must the stack be to make the String version overflow? (You can set your stack to 16MB by launching like so: scala -J-Xss16M) Commented Nov 3, 2012 at 23:34
  • I was just using the default stack size, but when I set it to 16M the program was still running 30mins later... Commented Nov 4, 2012 at 13:02
  • 1
    This could be related to the Scala 2.9.2 bug SI-6520. Commented Nov 4, 2012 at 15:01
  • Looking at the bug @DonRoby liked to and thinking about how parser combinators work, I think the backtracking mechanism would require the parser to have the entire file in memory to enable backtracking anyway. This does sound like a bug in the PagedSeq bug, so you're probably best off just reading it into a string. Commented Nov 4, 2012 at 21:21
  • I've run into this problem before, and have also used the string workaround. I've also just created a new issue (SI-6612; with an example that isn't parser combinator-specific), since this seems fairly distinct from SI-6520—the problem is blowing the stack, not running out of memory. Commented Nov 5, 2012 at 4:01

1 Answer 1

1

There is another library[1] that is a lot like the scala parser combinators that supports Trampolining which is what you need to stop the stackoverflow errors.

[1] https://github.com/djspiewak/gll-combinators

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.