gpt4 book ai didi

multithreading - Julia Threads.@threads 比单线程性能慢

转载 作者:行者123 更新时间:2023-12-05 06:11:36 31 4
gpt4 key购买 nike

我正在尝试对 1d 中的热方程进行数值求解:

enter image description here

我正在使用有限差分,但我在使用 Julia 中的 @threads 指令时遇到了一些问题。特别是下面有两个相同代码的版本:第一个是单线程,而另一个使用@threads(除了@thread 指令之外它们是相同的)

function heatSecLoop(;T::Float64)

println("start")
L = 1
ν = 0.5
Δt = 1e-6
Δx = 1e-3

Nt = ceil(Int, T/Δt )
Nx = ceil(Int,L/Δx + 2)
u = zeros(Nx)
u[round(Int,Nx/2)] = 1

println("starting loop")
for t=1:Nt-1
u_old = copy(u)
for i=2:Nx-1
u[i] = u_old[i] + ν * Δt/(Δx^2)*(u_old[i.-1]-2u_old[i] + u_old[i.+1])
end

if t % round(Int,Nt/10) == 0
println("time = " * string(round(t*Δt,digits=4)) )
end
end
println("done")
return u
end

function heatParLoop(;T::Float64)

println("start")
L = 1
ν = 0.5
Δt = 1e-6
Δx = 1e-3

Nt = ceil(Int, T/Δt )
Nx = ceil(Int,L/Δx + 2)
u = zeros(Nx)
u[round(Int,Nx/2)] = 1

println("starting loop")
for t=1:Nt-1
u_old = copy(u)
Threads.@threads for i=2:Nx-1
u[i] = u_old[i] + ν * Δt/(Δx^2)*(u_old[i.-1]-2u_old[i] + u_old[i.+1])
end

if t % round(Int,Nt/10) == 0
println("time = " * string(round(t*Δt,digits=4)) )
end
end
println("done")
return u
end

问题是顺序的比多线程的快。这是时间安排(运行一次编译后)

julia> Threads.nthreads()
2

julia> @time heatParLoop(T=1.0)
start
starting loop
time = 0.1
time = 0.2
time = 0.3
time = 0.4
time = 0.5
time = 0.6
time = 0.7
time = 0.8
time = 0.9
done
5.417182 seconds (12.14 M allocations: 9.125 GiB, 6.59% gc time)

julia> @time heatSecLoop(T=1.0)
start
starting loop
time = 0.1
time = 0.2
time = 0.3
time = 0.4
time = 0.5
time = 0.6
time = 0.7
time = 0.8
time = 0.9
done
3.892801 seconds (1.00 M allocations: 7.629 GiB, 8.06% gc time)

当然,热方程只是更复杂问题的一个例子。我还尝试将其他库(例如 SharedArrays)与 Distributed 一起使用,但结果更差。

感谢任何帮助。

最佳答案

这似乎仍然成立,可能是由于

  1. Threads.@threads 的开销
  2. 也许在较小程度上,Julia 中的垃圾收集是单线程的,这里的原始版本会产生相当数量的垃圾。

此外,根据链接讨论线程的建议,值得注意的是现在有一个线程版本的@avx(现在是@turbo)来自 LoopVectorization.jl 的宏,它使用来自 Polyester.jl 的非常轻量级的线程,尽管线程的开销仍然不小,但仍设法取得了更好的性能:

function heatSecLoop(;T::Float64)

println("start")
L = 1
ν = 0.5
Δt = 1e-6
Δx = 1e-3

Nt = ceil(Int, T/Δt )
Nx = ceil(Int,L/Δx + 2)
u = zeros(Nx)
u[round(Int,Nx/2)] = 1
u_old = similar(u)

println("starting loop")
for t=1:Nt-1
u_old, u = u, u_old
for i=2:Nx-1
u[i] = u_old[i] + ν * Δt/(Δx^2)*(u_old[i.-1]-2u_old[i] + u_old[i.+1])
end

if t % round(Int,Nt/10) == 0
println("time = " * string(round(t*Δt,digits=4)) )
end
end
println("done")
return u
end
function heatVecLoop(;T::Float64)
println("start")
L = 1
ν = 0.5
Δt = 1e-6
Δx = 1e-3

Nt = ceil(Int, T/Δt )
Nx = ceil(Int,L/Δx + 2)
u = zeros(Nx)
u[round(Int,Nx/2)] = 1
u_old = similar(u)

println("starting loop")
for t=1:Nt-1
u_old, u = u, u_old
@tturbo for i=2:Nx-1
u[i] = u_old[i] + ν * Δt/(Δx^2)*(u_old[i-1]-2u_old[i] + u_old[i+1])
end

if t % round(Int,Nt/10) == 0
println("time = " * string(round(t*Δt,digits=4)) )
end
end
println("done")
return u
end

function heatTVecLoop(;T::Float64)
println("start")
L = 1
ν = 0.5
Δt = 1e-6
Δx = 1e-3

Nt = ceil(Int, T/Δt )
Nx = ceil(Int,L/Δx + 2)
u = zeros(Nx)
u[round(Int,Nx/2)] = 1
u_old = similar(u)

println("starting loop")
for t=1:Nt-1
u_old, u = u, u_old
@tturbo for i=2:Nx-1
u[i] = u_old[i] + ν * Δt/(Δx^2)*(u_old[i-1]-2u_old[i] + u_old[i+1])
end

if t % round(Int,Nt/10) == 0
println("time = " * string(round(t*Δt,digits=4)) )
end
end
println("done")
return u
end
julia> @time heatSecLoop(T=1.0)
start
starting loop
time = 0.1
time = 0.2
time = 0.3
time = 0.4
time = 0.5
time = 0.6
time = 0.7
time = 0.8
time = 0.9
done
1.786011 seconds (114 allocations: 22.094 KiB)

julia> @time heatVecLoop(T=1.0)
start
starting loop
time = 0.1
time = 0.2
time = 0.3
time = 0.4
time = 0.5
time = 0.6
time = 0.7
time = 0.8
time = 0.9
done
0.314305 seconds (114 allocations: 22.094 KiB)

julia> @time heatTVecLoop(T=1.0)
start
starting loop
time = 0.1
time = 0.2
time = 0.3
time = 0.4
time = 0.5
time = 0.6
time = 0.7
time = 0.8
time = 0.9
done
0.300656 seconds (114 allocations: 22.094 KiB)

单线程 @turbo 向量化版本的性能自首次提出这个问题以来似乎也有了显着改善,而多线程 @tturbo 的性能对于更大的问题,版本可能会继续改进。

关于multithreading - Julia Threads.@threads 比单线程性能慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63933356/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com