gpt4 book ai didi

haskell - 简单解析器内存不足

转载 作者:行者123 更新时间:2023-12-02 15:53:42 26 4
gpt4 key购买 nike

我想了解为什么这个简单的解析器在处理大文件时会出现内存不足的情况。我真的不知道我做错了什么。

import Data.Attoparsec.ByteString.Char8
import qualified Data.Attoparsec.ByteString.Lazy as Lazy
import System.Environment
import qualified Data.ByteString.Lazy as B
import Control.Applicative

parseLine :: Parser String
parseLine = manyTill' anyChar (endOfLine <|> endOfInput)

parseAll :: Parser [Int]
parseAll = manyTill'
(parseLine >> (return 0)) -- discarding what's been read
endOfInput

main :: IO()
main = do
[fn] <- getArgs
text <- B.readFile fn

case Lazy.parse parseAll text of
Lazy.Fail _ _ _ -> putStrLn "bad"
Lazy.Done _ _ -> putStrLn "ok"

我正在运行该程序:

 runhaskell.exe test.hs x.log

输出:

test.hs: Out of memory

x.log 大小约为 500MB。我的机器有 16GB RAM。

最佳答案

如果你看the documentation of attoparsec您会注意到有一个类似的示例,并且附有以下注释:

Note the overlapping parsers anyChar and string "-->". While this will work, it is not very efficient, as it will cause a lot of backtracking.

使用 anyChar 的替代方案拒绝 endOfLine 接受的字符应该可以解决该问题。例如

satisfy (\c -> c `notElem` ['\n', '\r'])

关于haskell - 简单解析器内存不足,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41378676/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com