gpt4 book ai didi

c - 使用 OpenMP 进行分解的并行 GMP-Chudnovsky 中内核的作用

转载 作者:行者123 更新时间:2023-11-30 14:22:51 27 4
gpt4 key购买 nike

我最近发现了用于计算 pi 的 Chudnovsky 算法的实现:Parallel GMP-Chudnovsky using OpenMP with factorization

我使用默认的 1 核心选项将其编译为从 1o^3 到 10^8 的各种数字。然而,我注意到,随着核心数量的增加,计算结果所需的时间对于 cpu 和挂钟时间来说都需要更长的时间。为什么更多的核心会增加计算所需的时间?难道它不应该加快计算速度并带来更好的性能吗?

这是一个示例输出:

~/Desktop$ ./pgmp-chudnovsky 7500000 0 1
#terms=528852, depth=21, cores=1
sieve cputime = 0.120
...................................................
bs cputime = 30.300 wallclock = 30.313
gcd cputime = 6.380
div cputime = 3.800
sqrt cputime = 2.140
mul cputime = 1.420
total cputime = 37.800 wallclock = 37.838
P size=10919784 digits (1.455971)
Q size=10919777 digits (1.455970)


~/Desktop$ ./pgmp-chudnovsky 7500000 0 2
#terms=528852, depth=21, cores=2
sieve cputime = 0.120
...................................................
bs cputime = 30.890 wallclock = 17.661
gcd cputime = 12.930
div cputime = 3.790
sqrt cputime = 2.130
mul cputime = 1.420
total cputime = 38.380 wallclock = 25.153
P size=10919611 digits (1.455948)
Q size=10919605 digits (1.455947)

~/Desktop$ ./pgmp-chudnovsky 7500000 0 3
#terms=528852, depth=21, cores=3
sieve cputime = 0.120
...................................................
bs cputime = 31.400 wallclock = 14.266
gcd cputime = 21.640
div cputime = 3.810
sqrt cputime = 2.130
mul cputime = 1.410
total cputime = 38.900 wallclock = 21.784
P size=10726889 digits (1.430252)
Q size=10726883 digits (1.430251)

~/Desktop$ ./pgmp-chudnovsky 7500000 0 4
#terms=528852, depth=21, cores=4
sieve cputime = 0.130
...................................................
bs cputime = 32.480 wallclock = 11.771
gcd cputime = 27.770
div cputime = 3.800
sqrt cputime = 2.130
mul cputime = 1.410
total cputime = 39.980 wallclock = 19.284
P size=10920859 digits (1.456115)
Q size=10920852 digits (1.456114)

~/Desktop$ ./pgmp-chudnovsky 7500000 0 5
#terms=528852, depth=21, cores=5
sieve cputime = 0.130
...................................................
bs cputime = 33.010 wallclock = 15.496
gcd cputime = 28.500
div cputime = 3.790
sqrt cputime = 2.130
mul cputime = 1.420
total cputime = 40.510 wallclock = 23.000
P size=10605102 digits (1.414014)
Q size=10605096 digits (1.414013)

~/Desktop$ ./pgmp-chudnovsky 7500000 0 10
#terms=528852, depth=21, cores=10
sieve cputime = 0.130
...................................................
bs cputime = 33.210 wallclock = 14.311
gcd cputime = 29.640
div cputime = 3.780
sqrt cputime = 2.140
mul cputime = 1.420
total cputime = 40.720 wallclock = 21.822
P size=10607304 digits (1.414307)
Q size=10607297 digits (1.414306)

~/Desktop$ ./pgmp-chudnovsky 7500000 0 100
#terms=528852, depth=21, cores=100
sieve cputime = 0.120
...................................................
bs cputime = 33.080 wallclock = 13.412
gcd cputime = 17.630
div cputime = 3.780
sqrt cputime = 2.130
mul cputime = 1.420
total cputime = 40.570 wallclock = 20.912
P size=12169347 digits (1.622580)
Q size=12169341 digits (1.622579)

~/Desktop$ ./pgmp-chudnovsky 7500000 0 200
#terms=528852, depth=21, cores=200
sieve cputime = 0.130
...................................................
bs cputime = 34.080 wallclock = 13.942
gcd cputime = 15.620
div cputime = 3.760
sqrt cputime = 2.110
mul cputime = 1.420
total cputime = 41.530 wallclock = 21.401
P size=12642316 digits (1.685642)
Q size=12642309 digits (1.685641)

最佳答案

从结果来看,您拥有一个 4 核系统。在此之后增加使用的线程数将损害性能,因为您会获得线程上下文切换的开销,而无需完成任何更多的同步工作。

Cores    Total Time
1 37.838
2 25.153
3 21.784
4 19.284 *Best*
5 23.000
10 21.822
100 20.912

关于c - 使用 OpenMP 进行分解的并行 GMP-Chudnovsky 中内核的作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13436249/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com