我在玩Scala的懒惰迭代器,并且遇到了一个问题。我正在试图做的是在一个大文件读取,做一个转换,然后写出来的结果是:Scala无限迭代器OutOfMemory
object FileProcessor {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
// this "basic" lazy iterator works fine
// val iterator = inSource.getLines
// ...but this one, which incorporates my process method,
// throws OutOfMemoryExceptions
val iterator = process(inSource.getLines.toSeq).iterator
while(iterator.hasNext) outSource.println(iterator.next)
} finally {
inSource.close()
outSource.close()
}
}
// processing in this case just means upper-cases every line
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
}
所以我得到的大量文件一个OutOfMemoryException。我知道如果你围绕着Stream的头部引用,你可能会和Scala的懒惰Streams发生冲突。所以在这种情况下,我会小心地将process()的结果转换为迭代器,并丢弃最初返回的Seq。
有谁知道为什么这仍然会导致O(n)内存消耗?谢谢!
更新
针对FGE和huynhjl,就好像序列可能是罪魁祸首,但我不知道为什么。作为一个例子,下面的代码工作正常(我在各地使用Seq)。此代码不产生一个OutOfMemoryException:
object FileReader {
def main(args: Array[String]) {
val inSource = Source.fromFile("in.txt")
val outSource = new PrintWriter("out.txt")
try {
writeToFile(outSource, process(inSource.getLines.toSeq))
} finally {
inSource.close()
outSource.close()
}
}
@scala.annotation.tailrec
private def writeToFile(outSource: PrintWriter, contents: Seq[String]) {
if (! contents.isEmpty) {
outSource.println(contents.head)
writeToFile(outSource, contents.tail)
}
}
private def process(contents: Seq[String]) = contents.map(_.toUpperCase)
狂猜:'.getLines.toSeq'? – fge 2011-12-27 02:28:15