gpt4 book ai didi

haskell - 为什么这不能在恒定内存中运行?

转载 作者:行者123 更新时间:2023-12-02 07:07:16 28 4
gpt4 key购买 nike

我正在尝试将大量数据写入常量内存中的文件。

import qualified Data.ByteString.Lazy as B

{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
createDirectoryIfMissing True "data/grids/"
B.writeFile (gridFileName num aa)
(encode (take num grids))
B.writeFile (shuffledFileName num aa)
(encode (take num shuffleds))

但是,这会消耗与 num 大小成比例的内存。我知道 createGrids 是一个足够懒的函数,因为我已经通过将错误“不够懒”(如 Haskell wiki here 所建议的)附加到列表末尾来测试它它返回并且不会引发任何错误。 take 是一个在 Data.List 中定义的惰性函数。 encode 也是 Data.Binary 中定义的惰性函数。 B.writeFileData.ByteString.Lazy 中定义。

这是完整的代码,您可以执行它:

import Control.Arrow (first)
import Data.Binary
import GHC.Float (double2Float)
import System.Random (next)
import System.Random.Mersenne.Pure64 (PureMT, newPureMT, randomDouble)
import System.Random.Shuffle (shuffle')
import qualified Data.ByteString.Lazy as B

main :: IO ()
main = writeGrids 1000 64

{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
B.writeFile "grids.bin" (encode (take num grids))
B.writeFile "shuffleds.bin" (encode (take num shuffleds))

{- a random number generator, dimension of grids to make
returns a pair of lists, the first is a list of grids of dimensions
aa x aa, the second is a list of the shuffled grids corresponding to the first list -}
createGrids :: PureMT -> Int -> ([[(Float,Float)]],[[(Float,Float)]])
createGrids rng aa = (grids,shuffleds) where
rs = randomFloats rng
grids = map (getGridR aa) (chunksOf (2 * aa * aa) rs)
shuffleds = shuffler (aa * aa) rng grids

{- length of each grid, a random number generator, a list of grids
returns a the list with each grid shuffled -}
shuffler :: Int -> PureMT -> [[(Float,Float)]] -> [[(Float,Float)]]
shuffler n rng (xs:xss) = shuffle' xs n rng : shuffler n (snd (next rng)) xss
shuffler _ _ [] = []

{- divides list into chunks of size n -}
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = go
where go xs = case splitAt n xs of
(ys,zs) | null ys -> []
| otherwise -> ys : go zs

{- dimension of grid, list of random floats [0,1]
returns a list of (x,y) points of length n^2 such that all
points are in the range [0,1] and the points are a randomly
perturbed regular grid -}
getGridR :: Int -> [Float] -> [(Float,Float)]
getGridR n rs = pts where
nn = n * n
(irs,jrs) = splitAt nn rs
n' = fromIntegral n
grid = [ (p,q) | p <- [0..n'-1], q <- [0..n'-1] ]
pts = zipWith (\(p,q) (ir,jr) -> ((p+ir)/n',(q+jr)/n')) grid (zip irs jrs)

{- an infinite list of random floats in range [0,1] -}
randomFloats :: PureMT -> [Float]
randomFloats rng = let (d,rng') = first double2Float (randomDouble rng)
in d : randomFloats rng'

所需的包是: , 字节串 , 二进制 , 随机的 , mersenne-random-pure64 ,随机洗牌

最佳答案

内存使用的两个原因:

首先Data.Binary.encode 似乎没有在恒定空间中运行。以下程序使用 910 MB 内存:

import Data.Binary
import qualified Data.ByteString.Lazy as B

len = 10000000 :: Int

main = B.writeFile "grids.bin" $ encode [0..len]

如果我们在 len 中留下 0,我们会得到 97 MB 的内存使用量。

相比之下,以下程序使用 1 MB:

import qualified Data.ByteString.Lazy.Char8 as B

main = B.writeFile "grids.bin" $ B.pack $ show [0..(1000000::Int)]

第二,在您的程序中,shuffleds 包含对 grids 内容的引用,这会阻止 grids 的垃圾回收。因此,当我们打印grids时,我们也会对其进行评估,然后它必须保存在内存中,直到我们完成打印shuffleds。您的程序的以下版本仍然消耗大量内存,但如果我们用 B.writeFile 注释掉两行之一,它会使用恒定空间。

import qualified Data.ByteString.Lazy.Char8 as B

writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
B.writeFile "grids.bin" (B.pack $ show (take num grids))
B.writeFile "shuffleds.bin" (B.pack $ show (take num shuffleds))

关于haskell - 为什么这不能在恒定内存中运行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31541058/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com