gpt4 book ai didi

haskell - 使用 Data.Binary 对列表进行延迟解码

转载 作者:行者123 更新时间:2023-12-02 17:06:46 25 4
gpt4 key购买 nike

我使用此代码对列表进行延迟编码(取自此 SO question ):

import Data.Binary

newtype Stream a = Stream { unstream :: [a] }

instance Binary a => Binary (Stream a) where

put (Stream []) = putWord8 0
put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)

问题在于解码实现并不懒惰:

    get = do
t <- getWord8
case t of
0 -> return (Stream [])
1 -> do x <- get
Stream xs <- get
return (Stream (x:xs))

在我看来,这应该是懒惰的,但如果我们运行这个测试代码:

head $ unstream (decode $ encode $ Stream [1..10000000::Integer] :: Stream Integer)

内存使用量激增。由于某种原因,它想在让我查看第一个元素之前解码整个列表。

为什么这不是懒惰,我怎样才能让它变得懒惰?

最佳答案

它不是惰性的,因为 Get monad 是一个严格的状态 monad(在 binary-0.5.0.2 to 0.5.1.1 中;它之前是一个惰性状态 monad,在 binary-0.6.* 中它已经成为一个延续 monad,我尚未分析该更改的严格性影响):

-- | The parse state
data S = S {-# UNPACK #-} !B.ByteString -- current chunk
L.ByteString -- the rest of the input
{-# UNPACK #-} !Int64 -- bytes read

-- | The Get monad is just a State monad carrying around the input ByteString
-- We treat it as a strict state monad.
newtype Get a = Get { unGet :: S -> (# a, S #) }

-- Definition directly from Control.Monad.State.Strict
instance Monad Get where
return a = Get $ \s -> (# a, s #)
{-# INLINE return #-}

m >>= k = Get $ \s -> case unGet m s of
(# a, s' #) -> unGet (k a) s'
{-# INLINE (>>=) #-}

最终的递归

get >>= \x ->
get >>= \(Stream xs) ->
return (Stream (x:xs))

强制读取整个Stream,然后才能返回。

我认为不可能在 Get monad 中延迟解码 Stream (因此更不用说 Binary 实例了) 。但是您可以使用 runGetState 编写延迟解码函数:

-- | Run the Get monad applies a 'get'-based parser on the input
-- ByteString. Additional to the result of get it returns the number of
-- consumed bytes and the rest of the input.
runGetState :: Get a -> L.ByteString -> Int64 -> (a, L.ByteString, Int64)
runGetState m str off =
case unGet m (mkState str off) of
(# a, ~(S s ss newOff) #) -> (a, s `join` ss, newOff)

首先编写一个Get解析器,它返回一个Maybe a

getMaybe :: Binary a => Get (Maybe a)
getMaybe = do
t <- getWord8
case t of
0 -> return Nothing
_ -> fmap Just get

然后用它来创建一个类型为 (ByteString,Int64) -> Maybe (a,(ByteString,Int64)) 的函数:

step :: Binary a => (ByteString,Int64) -> Maybe (a,(ByteString,Int64))
step (xs,offset) = case runGetState getMaybe xs offset of
(Just v, ys, newOffset) -> Just (v,(ys,newOffset))
_ -> Nothing

然后您可以使用Data.List.unfoldr来延迟解码列表,

lazyDecodeList :: Binary a => ByteString -> [a]
lazyDecodeList xs = unfoldr step (xs,0)

并将其包装在Stream

lazyDecodeStream :: Binary a => ByteString -> Stream a
lazyDecodeStream = Stream . lazyDecodeList

关于haskell - 使用 Data.Binary 对列表进行延迟解码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11695373/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com