gpt4 book ai didi

multithreading - 通过并行化提高循环性能

转载 作者:行者123 更新时间:2023-12-03 12:45:47 27 4
gpt4 key购买 nike

因此,我正在努力思考 Julia 的并行化选项。我正在将随机过程建模为马尔可夫链。由于链是独立的副本,因此外部循环是独立的 - 使问题令人尴尬的并行。我尝试同时实现 @distributed@threads 解决方案,两者似乎都运行良好,但并不比快顺序。

这是我的代码的简化版本(顺序):

function dummy(steps = 10000, width = 100, chains = 4)
out_N = zeros(steps, width, chains)
initial = zeros(width)
for c = 1:chains
# print("c=$c\n")
N = zeros(steps, width)
state = copy(initial)
N[1,:] = state
for i = 1:steps
state = state + rand(width)
N[i,:] = state
end
out_N[:,:,c] = N
end
return out_N
end

将此问题并行化以提高性能的正确方法是什么?

最佳答案

这是正确的方法(在撰写此答案时,其他答案不起作用 - 请参阅我的评论)。

我将使用比问题中稍微简单的示例(但非常相似)。

1。非并行化版本(基准场景)

using Random
const m = MersenneTwister(0);

function dothestuff!(out_N, N, ic, m)
out_N[:, ic] .= rand(m, N)
end

function dummy_base(m=m, N=100_000,c=256)
out_N = Array{Float64}(undef,N,c)
for ic in 1:c
dothestuff!(out_N, N, ic, m)
end
out_N
end

测试:

julia> using BenchmarkTools; @btime dummy_base();
106.512 ms (514 allocations: 390.64 MiB)

2。与线程并行化

#remember to run before starting Julia:
# set JULIA_NUM_THREADS=4
# OR (Linux)
# export JULIA_NUM_THREADS=4

using Random

const mt = MersenneTwister.(1:Threads.nthreads());
# required for older Julia versions, look still good in later versions :-)

function dothestuff!(out_N, N, ic, m)
out_N[:, ic] .= rand(m, N)
end
function dummy_threads(mt=mt, N=100_000,c=256)
out_N = Array{Float64}(undef,N,c)
Threads.@threads for ic in 1:c
dothestuff!(out_N, N, ic, mt[Threads.threadid()])
end
out_N
end

让我们测试性能:

julia> using BenchmarkTools; @btime dummy_threads();
46.775 ms (535 allocations: 390.65 MiB)

3。与进程并行化(在一台机器上)

using Distributed

addprocs(4)

using Random, SharedArrays
@everywhere using Random, SharedArrays, Distributed
@everywhere Random.seed!(myid())

@everywhere function dothestuff!(out_N, N, ic)
out_N[:, ic] .= rand(N)
end
function dummy_distr(N=100_000,c=256)
out_N = SharedArray{Float64}(N,c)
@sync @distributed for ic in 1:c
dothestuff!(out_N, N, ic)
end
out_N
end

性能(请注意,进程间通信需要一些时间,因此对于小型计算,线程通常会更好):

julia> using BenchmarkTools; @btime dummy_distr();
62.584 ms (1073 allocations: 45.48 KiB)

关于multithreading - 通过并行化提高循环性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62821823/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com