gpt4 book ai didi

haskell - 列表并行评估

转载 作者:行者123 更新时间:2023-12-02 05:49:56 24 4
gpt4 key购买 nike

我试图在 Haskell 中尝试并行计算,但似乎遇到了困难。

作为一个实验,我想评估一系列需要很长时间才能完成的任务。所以我想出了这个人为的例子。

import Control.Parallel.Strategies

startNum = 800000

bigList :: [Integer]
bigList = [2042^x | x <- [startNum..startNum+10]]

main = print $ sum $ parMap rdeepseq (length . show) bigList

我用 ghc -O2 -eventlog -rtsopts -threaded test.hs --make 编译了它,然后运行它两次。

$ time ./test +RTS -N1 -lf -sstderr
29128678
2,702,130,280 bytes allocated in the heap
59,409,320 bytes copied during GC
3,114,392 bytes maximum residency (68 sample(s))
1,093,600 bytes maximum slop
28 MB total memory in use (6 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3101 colls, 0 par 0.09s 0.08s 0.0000s 0.0005s
Gen 1 68 colls, 0 par 0.03s 0.03s 0.0004s 0.0009s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 11 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 11 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 10.13s ( 10.13s elapsed)
GC time 0.11s ( 0.11s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 10.25s ( 10.25s elapsed)
Alloc rate 266,683,731 bytes per MUT second
Productivity 98.9% of total user, 98.9% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 0m10.250s
user 0m10.144s
sys 0m0.106s

$ time ./test +RTS -N4 -lf -sstderr
29128678
2,702,811,640 bytes allocated in the heap
712,017,768 bytes copied during GC
22,024,144 bytes maximum residency (67 sample(s))
6,134,968 bytes maximum slop
68 MB total memory in use (3 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1329 colls, 1329 par 2.77s 0.70s 0.0005s 0.0075s
Gen 1 67 colls, 66 par 0.11s 0.03s 0.0004s 0.0019s
Parallel GC work balance: 40.17% (serial 0%, perfect 100%)
TASKS: 10 (1 bound, 9 peak workers (9 total), using -N4)
SPARKS: 11 (11 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 51.56s ( 13.04s elapsed)
GC time 2.89s ( 0.73s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 54.45s ( 13.77s elapsed)
Alloc rate 52,423,243 bytes per MUT second
Productivity 94.7% of total user, 374.4% of total elapsed
gc_alloc_block_sync: 39520
whitehole_spin: 0
gen[0].sync: 3046
gen[1].sync: 4970
real 0m13.777s
user 0m44.362s
sys 0m10.093s

我注意到 GC 时间略有增加,但没有什么是我认为额外的核心无法克服的。

所以我拿出threadscope来看看。

这是 -N1 的结果 Run with -N1

这是 -N4 的结果 Run with -N4

在 -N1 情况下, Spark 似乎能够更快地执行。

我的问题。为什么这没有看到我期望的并行执行的一堆独立任务的速度?

最佳答案

这似乎与整数运算有关。如果您用其他东西替换它们,那么您将看到并行处理的加速。

此代码没有使用 -N2 加速:

main = 
let x = length . show $ 10^10000000
y = length . show $ 10^10000001 in
x `par` y `pseq` print (x + y)

这也不是

main = 
let x = (10^10000000 :: Integer) `quotRem` 123
y = (10^10000001 :: Integer) `quotRem` 123 in
x `par` y `pseq` print "ok"

但是这段代码确实有并行加速:

main = 
let x = length $ replicate 1000000000 'x'
y = length $ replicate 1000000001 'y' in
x `par` y `pseq` print (x + y)

不过,我在 integer-gmp 中找不到任何锁定。

关于haskell - 列表并行评估,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25262734/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com