gpt4 book ai didi

haskell - 使用 Iteratee 库编写 "wc -l"- 如何过滤换行符?

转载 作者:行者123 更新时间:2023-12-04 12:12:23 26 4
gpt4 key购买 nike

我正在尝试使用 Haskell Iteratee 库提出等效的“wc -l”。下面是“wc”的代码(它只计算单词 - 类似于 hackage 上的 iteratee 示例中的代码),并且运行速度非常快:



{-# LANGUAGE BangPatterns #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString


length1 :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
length1 = liftI (step 0)
where
step !i (Chunk xs) = liftI (step $ i + fromIntegral (LL.length xs))
step !i stream = idone i stream
{-# INLINE length1 #-}
main = do
i' <- enumFile 1024 "/usr/share/dict/words" (length1 :: (Monad m) => Iteratee ByteString m Int)
result <- run i'
print result
{- Time measured on a linux x86 box:
$ time ./test ## above haskell compiled code
4950996

real 0m0.013s
user 0m0.004s
sys 0m0.007s

$ time wc -c /usr/share/dict/words
4950996 /usr/share/dict/words

real 0m0.003s
user 0m0.000s
sys 0m0.002s
-}

现在,如何扩展它来计算运行速度过快的行数?我做了一个版本,使用 Prelude.filter 仅过滤“\n”到长度,但它比 linux“wc -l”慢,因为内存太多,而且 gc(我猜是懒惰的评估)。所以,我使用 Data.ListLike.filter 编写了另一个版本,但它不会编译,因为它没有类型检查 - 在这里的帮助将不胜感激:

  {-# LANGUAGE BangPatterns #-}
import Data.Iteratee as I
import Data.ListLike as LL
import Data.Iteratee.IO
import Data.ByteString
import Data.Char
import Data.ByteString.Char8 (pack)

numlines :: (Monad m, Num a, LL.ListLike s el) => Iteratee s m a
numlines = liftI $ step 0
where
step !i (Chunk xs) = liftI (step $i + fromIntegral (LL.length $ LL.filter (\x -> x == Data.ByteString.Char8.pack "\n") xs))
step !i stream = idone i stream
{-# INLINE numlines #-}

main = do
i' <- enumFile 1024 "/usr/share/dict/words" (numlines :: (Monad m) => Iteratee ByteString m Int)
result <- run i'
print result

最佳答案

所以我做了一些实验,我得到了一个 wc -l,它的速度只有“wc -l”的两倍。这甚至比上面显示的 wc -c 版本的性能还要好。

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy.Char8 as BSL
import qualified Data.ByteString.Char8 as BS
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Binary as EB
import Control.Monad.IO.Class (liftIO)
import Data.Int

numlines :: Int64 -> E.Iteratee BS.ByteString IO ()
numlines n = do
chunk <- EB.take 1024
case chunk of
"" -> do liftIO $ print n
return ()
a -> do let ct = BSL.count '\n' a
numlines (n+ct)

main = do
let i = EB.enumFile "/usr/share/dict/words" E.$$ numlines 0
E.run_ i

运行它与原生:
Eriks-MacBook-Air:skunk erikhinton$ time wc -l "/usr/share/dict/words"
235886 /usr/share/dict/words

real 0m0.009s
user 0m0.006s
sys 0m0.002s
Eriks-MacBook-Air:skunk erikhinton$ time ./wcl
235886

real 0m0.019s
user 0m0.013s
sys 0m0.005s

[编辑]

这是一种更快、更小、更简洁/更具表现力的方式。这些枚举器开始变得有趣。
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy.Char8 as BSL
import qualified Data.ByteString.Char8 as BS
import qualified Data.Enumerator as E
import qualified Data.Enumerator.Binary as EB
import qualified Data.Enumerator.List as EL
import Control.Monad.IO.Class (liftIO)
import Data.Int

numlines :: E.Iteratee BS.ByteString IO ()
numlines = do
num <- EL.fold (\n b -> (BS.count '\n' b) + n ) 0
liftIO . print $ num

main = do
let i = EB.enumFile "/usr/share/dict/words" E.$$ numlines
E.run_ i

和时机
Eriks-MacBook-Air:skunk erikhinton$ time ./wcl2
235886

real 0m0.015s
user 0m0.010s
sys 0m0.004s

关于haskell - 使用 Iteratee 库编写 "wc -l"- 如何过滤换行符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7986307/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com