gpt4 book ai didi

haskell - 如何优化大型 IntMap 生成?

转载 作者:行者123 更新时间:2023-12-01 22:26:27 24 4
gpt4 key购买 nike

以下代码的 allCases 大小为 2^20 IntMap,其生成需要大量计算和内存。我不知道这是否是不可避免的成本,以及如何找出效率低下的地方。

import Control.Monad (forM_)
import Control.Monad.ST
import Data.Array
import Data.Array.ST
import Data.Bits
import Data.List
import qualified Data.IntMap as M

type Switch = Int
type Clock = Int

switches :: Array Switch [Clock]
switches = listArray (0, 9) [
[0, 1, 2],
[3, 7, 9, 11],
[4, 10, 14, 15],
[0, 4, 5, 6, 7],
[6, 7, 8, 10, 12],
[0, 2, 14, 15],
[3, 14, 15],
[4, 5, 7, 1, 15],
[1, 2, 3, 4, 5],
[3, 4, 5, 9, 13]]

type Quads = Int

intsToQuads :: [Int] -> Quads
intsToQuads [] = 0
intsToQuads (x:xs) = x .|. (intsToQuads xs `shiftL` 2)

switchCases :: [[Int]]
switchCases = sequence $ replicate 10 [0..3]

applySwitch :: Int -> STUArray s Int Int -> ST s ()
applySwitch sw clocks = forM_ (switches ! sw) $ \ix -> do
clock <- readArray clocks ix
writeArray clocks ix ((clock + 1) `rem` 4)

allCasesST :: Int -> STUArray s Int Int -> Int -> ST s (M.IntMap Int)
allCasesST ix clocks pushs
| ix > 9 = do
cs <- getElems clocks
return (M.singleton (intsToQuads cs) pushs)
| otherwise = do
rs <- mapM next [pushs..pushs + 3]
return (M.unions rs)
where
next pu = do
rs <- allCasesST (ix + 1) clocks pu
applySwitch ix clocks
return rs

allCases :: M.IntMap Int
allCases = runST $ do
st <- newArray (0,15) 0
allCasesST 0 st 0

main = do
putStrLn . show $ M.lookup 0 allCases
return ()

以下是分析结果。 Profiling results 1

  2,754,242,472 bytes allocated in the heap
2,095,063,056 bytes copied during GC
788,992,504 bytes maximum residency (23 sample(s))
3,087,880 bytes maximum slop
815 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed) Avg pause Max pause
Gen 0 2618 colls, 0 par 10.859s 11.052s 0.0042s 1.1284s
Gen 1 23 colls, 0 par 0.000s 0.011s 0.0005s 0.0009s

INIT time 0.000s ( 0.001s elapsed)
MUT time 1.531s ( 3.622s elapsed)
GC time 8.750s ( 8.916s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 2.109s ( 2.146s elapsed)
EXIT time 0.047s ( 0.072s elapsed)
Total time 12.438s ( 12.612s elapsed)

%GC time 70.4% (70.7% elapsed)

Alloc rate 1,798,688,961 bytes per MUT second

Productivity 12.7% of total user, 12.3% of total elapsed

.prof 文件(为了整洁进行了一些修改)。

    Mon Dec 25 23:52 2017 Time and Allocation Profiling Report  (Final)

a.exe +RTS -p -hc -xc -s -c -RTS

total time = 1.52 secs (1518 ticks @ 1000 us, 1 processor)
total alloc = 1,576,087,408 bytes (excludes profiling overheads)

COST CENTRE MODULE SRC %time %alloc

intsToQuads Main Main.hs:38:29-42 24.9 17.0
allCasesST Main Main.hs:55:13-23 13.7 20.2
applySwitch Main Main.hs:44:25-45 12.2 0.0
allCasesST Main Main.hs:51:11-25 10.5 44.2
applySwitch Main Main.hs:46:3-44 9.5 0.0
intsToQuads Main Main.hs:38:22-54 7.2 0.0
allCasesST.next Main Main.hs:58:13-41 6.2 5.7
intsToQuads Main Main.hs:38:29-53 4.2 0.0
applySwitch Main Main.hs:45:12-30 2.6 7.1
allCasesST.next Main Main.hs:59:7-27 2.2 0.0
allCasesST Main Main.hs:52:26-39 1.8 1.1
applySwitch Main Main.hs:(44,25)-(46,44) 1.5 0.0
allCasesST Main Main.hs:52:13-46 1.4 2.7
allCasesST Main Main.hs:54:11-38 0.7 2.1


individual inherited
COST CENTRE SRC entries %time %alloc %time %alloc

MAIN <built-in> 0 0.1 0.0 100.0 100.0
CAF GHC.IO.Encoding.CodePage 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.Text 0 0.0 0.0 0.0 0.0
CAF GHC.Show 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 0 0.0 0.0 0.0 0.0
CAF <entire-module> 0 0.0 0.0 99.9 100.0
allCases Main.hs:(63,12)-(65,19) 0 0.0 0.0 99.9 100.0
allCases Main.hs:64:9-25 0 0.0 0.0 0.0 0.0
allCasesST Main.hs:55:5-24 0 0.0 0.0 5.5 5.3
allCasesST Main.hs:55:13-23 0 5.5 5.3 5.5 5.3
allCasesST Main.hs:54:11-38 0 0.7 2.1 94.5 94.7
allCasesST.next Main.hs:59:7-27 0 0.1 0.0 0.5 0.0
applySwitch Main.hs:(44,25)-(46,44) 0 0.0 0.0 0.3 0.0
applySwitch Main.hs:44:25-45 0 0.1 0.0 0.3 0.0
applySwitch Main.hs:45:12-30 0 0.0 0.0 0.0 0.0
applySwitch Main.hs:44:32-44 0 0.2 0.0 0.2 0.0
allCasesST.next Main.hs:58:13-41 0 6.2 5.7 93.3 92.5
allCasesST Main.hs:52:5-47 0 0.1 0.0 39.7 20.8
allCasesST Main.hs:52:13-46 0 1.4 2.7 39.6 20.8
allCasesST Main.hs:52:26-39 0 1.8 1.1 38.2 18.1
intsToQuads Main.hs:38:22-54 0 7.2 0.0 36.4 17.0
intsToQuads Main.hs:38:29-53 0 4.2 0.0 29.1 17.0
intsToQuads Main.hs:38:29-42 0 24.9 17.0 24.9 17.0
allCasesST Main.hs:55:5-24 0 0.1 0.0 8.3 14.9
allCasesST Main.hs:55:13-23 0 8.2 14.9 8.2 14.9
allCasesST Main.hs:51:11-25 0 10.5 44.2 10.5 44.2
allCasesST.next Main.hs:59:7-27 0 2.1 0.0 28.7 7.1
applySwitch Main.hs:(44,25)-(46,44) 0 1.5 0.0 26.5 7.1
applySwitch Main.hs:44:25-45 0 12.1 0.0 25.0 7.1
applySwitch Main.hs:46:3-44 0 9.5 0.0 10.4 0.0
applySwitch Main.hs:46:25-43 0 0.9 0.0 0.9 0.0
applySwitch Main.hs:45:12-30 0 2.6 7.1 2.6 7.1
main Main.hs:68:3-39 0 0.0 0.0 0.0 0.0
main Main.hs:68:21-39 0 0.0 0.0 0.0 0.0
main Main.hs:68:3-17 0 0.0 0.0 0.0 0.0
switches Main.hs:(22,12)-(32,19) 0 0.0 0.0 0.0 0.0
main Main.hs:68:3-39 0 0.0 0.0 0.0 0.0
main Main.hs:68:3-17 0 0.0 0.0 0.0 0.0

最佳答案

你的算法有点太懒了,你在 map 中放入了未评估的重击。

您的原始代码执行如下:

<<ghc: 1185249008 bytes, 1136 GCs, 85006914/288126544 avg/max bytes residency (10 samples), 827M in use, 0.000 INIT (0.002 elapsed), 0.595 MUT (0.605 elapsed), 1.830 GC (2.065 elapsed) :ghc>>
./jeiea +RTS -tstderr 2.43s user 0.32s system 99% cpu 2.773 total

即,288MB 驻留时间,2.4 秒。

但是如果我们强制评估 intsToQuads:

allCasesST :: Int -> STUArray s Int Int -> Int -> ST s (M.IntMap Int)
allCasesST ix clocks pushs
| ix > 9 = do
cs <- getElems clocks
let n = intsToQuads cs
n `seq` return (M.singleton n pushs)

那么性能是:

<<ghc: 1151694576 bytes, 1104 GCs, 26972457/72834536 avg/max bytes residency (11 samples), 215M in use, 0.000 INIT (0.002 elapsed), 0.500 MUT (0.515 elapsed), 0.735 GC (0.816 elapsed) :ghc>>
./jeiea +RTS -tstderr 1.24s user 0.11s system 98% cpu 1.367 total

最大驻留时间为 72MB,时间为 1.25 秒。

如果我们对 applySwitchnext 应用相同的修改,我们将获得以下性能:

<<ghc: 1151694576 bytes, 1104 GCs, 26972457/72834536 avg/max bytes residency (11 samples), 215M in use, 0.000 INIT (0.002 elapsed), 0.389 MUT (0.395 elapsed), 0.517 GC (0.573 elapsed) :ghc>>

这是 72MB 驻留时间,大约 1 秒,但差异很大。

编辑:

完整代码并运行:

import Control.Monad (forM_)
import Control.Monad.ST
import Data.Array
import Data.Array.ST
import Data.Bits
import Data.List
import qualified Data.IntMap as M

type Switch = Int
type Clock = Int

switches :: Array Switch [Clock]
switches = listArray (0, 9) [
[0, 1, 2],
[3, 7, 9, 11],
[4, 10, 14, 15],
[0, 4, 5, 6, 7],
[6, 7, 8, 10, 12],
[0, 2, 14, 15],
[3, 14, 15],
[4, 5, 7, 1, 15],
[1, 2, 3, 4, 5],
[3, 4, 5, 9, 13]]

type Quads = Int

intsToQuads :: [Int] -> Quads
intsToQuads [] = 0
intsToQuads (x:xs) = x .|. (intsToQuads xs `shiftL` 2)

switchCases :: [[Int]]
switchCases = sequence $ replicate 10 [0..3]

applySwitch :: Int -> STUArray s Int Int -> ST s ()
applySwitch sw clocks = forM_ (switches ! sw) $ \ix -> do
clock <- readArray clocks ix
let n = ((clock + 1) `rem` 4)
n `seq` writeArray clocks ix n

allCasesST :: Int -> STUArray s Int Int -> Int -> ST s (M.IntMap Int)
allCasesST ix clocks pushs
| ix > 9 = do
cs <- getElems clocks
let n = intsToQuads cs
n `seq` return (M.singleton n pushs)
| otherwise = do
rs <- mapM next [pushs..pushs + 3]
return (M.unions rs)
where
next pu = do
let n = ix + 1
rs <- n `seq` allCasesST n clocks pu
applySwitch ix clocks
return rs

allCases :: M.IntMap Int
allCases = runST $ do
st <- newArray (0,15) 0
allCasesST 0 st 0

main = do
putStrLn . show $ M.lookup 0 allCases
return ()

与:

% time ./jeiea +RTS -tstderr
Just 0
<<ghc: 1151694576 bytes, 1104 GCs, 26972457/72834536 avg/max bytes residency (11 samples), 215M in use, 0.000 INIT (0.005 elapsed), 0.812 MUT (0.820 elapsed), 0.979 GC (1.093 elapsed) :ghc>>
./jeiea +RTS -tstderr 1.79s user 0.16s system 99% cpu 1.971 total

关于haskell - 如何优化大型 IntMap 生成?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47969861/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com