2014-06-05 13 views
5

我一直在使用attoparsec写了以下分析代码:转换正常attoparsec解析器代码到管道/管

data Test = Test { 
    a :: Int, 
    b :: Int 
    } deriving (Show) 

testParser :: Parser Test 
testParser = do 
    a <- decimal 
    tab 
    b <- decimal 
    return $ Test a b 

tParser :: Parser [Test] 
tParser = many' $ testParser <* endOfLine 

这正常为小大小的文件,我执行这样的:

main :: IO() 
main = do 
    text <- TL.readFile "./testFile" 
    let (Right a) = parseOnly (manyTill anyChar endOfLine *> tParser) text 
    print a 

但是,当文件大小大于70MB时,会消耗大量内存。作为解决方案,我认为我会使用attoparsec-conduit。通过他们的API后,我不知道如何让他们一起工作。我的解析器的类型为Parser Test,但它的实际接受的解析器类型为sinkParser。我对如何在常量内存中执行这个解析器感兴趣? (基于甲管溶液也可以接受,但我不用于将管道API。)

回答

5

第一种类型的参数Parser仅仅是输入(或者TextByteString)的数据类型。您可以提供您的testParser函数作为参数sinkParser,它会正常工作。这里有一个简单的例子:

{-# LANGUAGE OverloadedStrings #-} 
import   Conduit     (liftIO, mapM_C, runResourceT, 
              sourceFile, ($$), (=$)) 
import   Data.Attoparsec.Text (Parser, decimal, endOfLine, space) 
import   Data.Conduit.Attoparsec (conduitParser) 

data Test = Test { 
    a :: Int, 
    b :: Int 
    } deriving (Show) 

testParser :: Parser Test 
testParser = do 
    a <- decimal 
    space 
    b <- decimal 
    endOfLine 
    return $ Test a b 

main :: IO() 
main = runResourceT 
    $ sourceFile "foo.txt" 
    $$ conduitParser testParser 
    =$ mapM_C (liftIO . print) 
5

这里是pipes解决方案(假设你使用的是基于Text解析器):

import Pipes 
import Pipes.Text.IO (fromHandle) 
import Pipes.Attoparsec (parsed) 
import qualified System.IO as IO 

main = IO.withFile "./testfile" IO.ReadMode $ \handle -> runEffect $ 
    for (parsed (testParser <* endOfLine) (fromHandle handle)) (lift . print)