I'm playing around with Scala's lazy iterators, and I've run into an issue. What I'm trying to do is read in a large file, do a transformation, and then write out the result:
object FileProcessor {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
// this "basic" lazy iterator works fine
// val iterator = inSource.getLines
// ...but this one, which incorporates my process method,
// throws OutOfMemoryExceptions
val iterator = process(inSource.getLines.toSeq).iterator
while(iterator.hasNext) outSource.println(iterator.next)
} finally {
inSource.close()
outSource.close()
}
}
// processing in this case just means upper-cases every line
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
}
So I'm getting an OutOfMemoryException on large files. I know you can run afoul of Scala's lazy Streams if you keep around references to the head of the Stream. So in this case I'm careful to convert the result of process() to an iterator and throw-away the Seq it initially returns.
Does anyone know why this still causes O(n) memory consumption? Thanks!
Update
In response to fge and huynhjl, it seems like the Seq might be the culprit, but I don't know why. As an example, the following code works fine (and I'm using Seq all over the place). This code does not produce an OutOfMemoryException:
object FileReader {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
writeToFile(outSource, process(inSource.getLines.toSeq))
} finally {
inSource.close()
outSource.close()
}
}
@scala.annotation.tailrec
private def writeToFile(outSource: PrintWriter, contents: Seq[String]) {
if (! contents.isEmpty) {
outSource.println(contents.head)
writeToFile(outSource, contents.tail)
}
}
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
.getLines.toSeq?