gpt4 book ai didi

python - 在多处理池中运行 scipy.integrate.ode 会导致巨大的性能损失

转载 作者:太空狗 更新时间:2023-10-29 18:08:07 26 4
gpt4 key购买 nike

我正在使用 python 的 scipy.integrate 来模拟 29 维线性微分方程组。由于我需要解决多个问题实例,我想我可以通过使用 multiprocessing.Pool 并行计算来加快速度。由于线程之间不需要共享数据或同步(问题是令人尴尬的并行),我认为这显然应该可行。然而,在我编写了执行此操作的代码之后,我得到了非常奇怪的性能测量值:

  • 单线程,无 jacobian:每次调用 20-30 毫秒
  • 单线程,使用 jacobian:每次调用 10-20 毫秒
  • 多线程,没有 jacobian:每次调用 20-30 毫秒
  • 多线程,使用 jacobian:每次调用 10-5000 毫秒

令人震惊的是,我认为应该是最快的设置,实际上是最慢的,而且可变性两个数量级。这是一个确定性的计算;计算机不应该以这种方式工作。可能是什么原因造成的?

效果似乎取决于系统

我在另一台电脑上尝试了同样的代码,但没有看到这种效果。

两台机器都使用 Ubuntu 64 位、Python 2.7.6、scipy 版本 0.18.0 和 numpy 版本 1.8.2。我没有看到 Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz 处理器的可变性。我确实看到了 Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz 的问题。

理论

一个想法是处理器之间可能有一个共享缓存,通过并行运行它我无法在缓存中容纳两个雅可比矩阵实例,所以它们不断地为缓存而相互争斗,互相减慢速度与它们是连续运行还是没有雅可比相比较。但它不是百万可变系统。 jacobian是一个29x29的矩阵,占用6728字节。处理器上的 1 级缓存是 4 x 32 KB ,更大。处理器之间是否有任何其他共享资源可能是罪魁祸首?我们如何测试它?

我注意到的另一件事是,每个 python 进程在运行时似乎都占用了 CPU 的百分之几百。这似乎意味着代码在某些时候已经并行化了(可能在低级库中)。这可能意味着进一步的并行化无济于事,但我预计不会出现如此剧烈的减速。

代码

最好在更多机器上尝试一下,看看 (1) 其他人是否能体验到减速,以及 (2) 发生减速的系统的共同特征是什么。该代码使用大小为 2 的多处理池对两个并行计算进行 10 次试验,打印出 10 次试验中每次 scipy.ode.integrate 调用的时间。

'odeint with multiprocessing variable execution time demonsrtation'

from numpy import dot as npdot
from numpy import add as npadd
from numpy import matrix as npmatrix
from scipy.integrate import ode
from multiprocessing import Pool
import time

def main():
"main function"

pool = Pool(2) # try Pool(1)
params = [0] * 2

for trial in xrange(10):
res = pool.map(run_one, params)
print "{}. times: {}ms, {}ms".format(trial, int(1000 * res[0]), int(1000 * res[1]))

def run_one(_):
"perform one simulation"

final_time = 2.0
init_state = [0.1 if d < 7 else 0.0 for d in xrange(29)]
(a_matrix, b_vector) = get_dynamics()

derivative = lambda dummy_t, state: npadd(npdot(a_matrix, state), b_vector)
jacobian = lambda dummy_t, dummy_state: a_matrix
#jacobian = None # try without the jacobian

#print "jacobian bytes:", jacobian(0, 0).nbytes

solver = ode(derivative, jacobian)
solver.set_integrator('vode')
solver.set_initial_value(init_state, 0)

start = time.time()
solver.integrate(final_time)
dif = time.time() - start

return dif

def get_dynamics():
"return a tuple (A, b), which are the system dynamics x' = Ax + b"

return \
(
npmatrix([
[0, 0, 0, 0.99857378006, 0.053384274244, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[0, 0, 1, -0.003182219341, 0.059524655342, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[0, 0, -11.570495605469, -2.544637680054, -0.063602626324, 0.106780529022, -0.09491866827, 0.007107574493, -5.20817921341, -23.125876742495, -4.246931301528, -0.710743697134, -1.486697327603, -0.044548215175, 0.03436637817, 0.022990248611, 0.580153205353, 1.047552018229, 11.265023544535, 2.622275290571, 0.382949404795, 0.453076470454, 0.022651889536, 0.012533628369, 0.108399390974, -0.160139432044, -6.115359574845, -0.038972389136, 0, ],
[0, 0, 0.439356565475, -1.998182296753, 0, 0.016651883721, 0.018462046981, -0.001187470742, -10.778778281386, 0.343052863546, -0.034949331535, -3.466737362551, 0.013415853489, -0.006501746896, -0.007248032248, -0.004835912875, -0.152495086764, 2.03915052839, -0.169614300211, -0.279125393264, -0.003678218266, -0.001679708185, 0.050812027754, 0.043273505033, -0.062305315646, 0.979162836629, 0.040401368402, 0.010697028656, 0, ],
[0, 0, -2.040895462036, -0.458999156952, -0.73502779007, 0.019255757332, -0.00459562242, 0.002120360732, -1.06432932386, -3.659159530947, -0.493546966858, -0.059561101143, -1.953512259413, -0.010939065041, -0.000271004496, 0.050563886711, 1.58833954495, 0.219923768171, 1.821923233098, 2.69319056633, 0.068619628466, 0.086310028398, 0.002415425662, 0.000727041422, 0.640963888079, -0.023016712545, -1.069845542887, -0.596675149197, 0, ],
[-32.103607177734, 0, -0.503355026245, 2.297859191895, 0, -0.021215811372, -0.02116791904, 0.01581159234, 12.45916782984, -0.353636907076, 0.064136531117, 4.035326800046, -0.272152744884, 0.000999589868, 0.002529691904, 0.111632959213, 2.736421830861, -2.354540136198, 0.175216915979, 0.86308171287, 0.004401276193, 0.004373406589, -0.059795009475, -0.051005479746, 0.609531777761, -1.1157829788, -0.026305051933, -0.033738880627, 0, ],
[0.102161169052, 32.057830810547, -2.347217559814, -0.503611564636, 0.83494758606, 0.02122657001, -0.037879735231, 0.00035400386, -0.761479736492, -5.12933410588, -1.131382179292, -0.148788337148, 1.380741054924, -0.012931029503, 0.007645723855, 0.073796656681, 1.361745395486, 0.150700793731, 2.452437244444, -1.44883919298, 0.076516270282, 0.087122640348, 0.004623192159, 0.002635233443, -0.079401941141, -0.031023369979, -1.225533436977, 0.657926151362, 0, ],
[-1.910972595215, 1.713829040527, -0.004005432129, -0.057411193848, 0, 0.013989634812, -0.000906753354, -0.290513515472, -2.060635522957, -0.774845915178, -0.471751979387, -1.213891560083, 5.030515136324, 0.126407660877, 0.113188603433, -2.078420624662, -50.18523312358, 0.340665548784, 0.375863242926, -10.641168797333, -0.003634153255, -0.047962774317, 0.030509705209, 0.027584169642, -10.542357589006, -0.126840767097, -0.391839285172, 0.420788121692, 0, ],
[0.126296110212, -0.002898250629, -0.319316070797, 0.785201711657, 0.001772374259, 0.00000584372, 0.000005233812, -0.000097899495, -0.072611454126, 0.001666291957, 0.195701043078, 0.517339177294, 0.05236528267, -0.000003359731, -0.000003009077, 0.000056285381, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[-0.018114066432, 0.077615035084, 0.710897211118, 2.454275059389, -0.012792968774, 0.000040510624, 0.000036282541, -0.000678672106, 0.010414324729, -0.044623231468, 0.564308412696, -1.507321670112, 0.066879720068, -0.000023290783, -0.00002085993, 0.000390189123, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[-0.019957254425, 0.007108972111, 122.639137999354, 1.791704310155, 0.138329792976, 0.000000726169, 0.000000650379, -0.000012165459, -8.481152717711, -37.713895394132, -93.658221074435, -4.801972165378, -2.567389718833, 0.034138340146, -0.038880106034, 0.044603217363, 0.946016722396, 1.708172458034, 18.369114490772, 4.275967542224, 0.624449778826, 0.738801257357, 0.036936909247, 0.020437742859, 0.176759579388, -0.261128576436, -9.971904607075, -0.063549647738, 0, ],
[0.007852964982, 0.003925745426, 0.287856349997, 58.053471054491, 0.030698062827, -0.000006837601, -0.000006123962, 0.000114549925, -17.580742026275, 0.55713614874, 0.205946900184, -43.230778067404, 0.004227082975, 0.006053854501, 0.006646690253, -0.009138926083, -0.248663457912, 3.325105302428, -0.276578605231, -0.455150962257, -0.005997822569, -0.002738986905, 0.082855748293, 0.070563187482, -0.101597078067, 1.596654829885, 0.065879787896, 0.017442923517, 0, ],
[0.011497315687, -0.012583019909, 13.848373855148, 22.28881517216, 0.042287331657, 0.000197558695, 0.000176939544, -0.003309689199, -1.742140233901, -5.959510415282, -11.333020298294, -14.216479234895, -3.944800806497, 0.001304578929, -0.005139259078, 0.08647432259, 2.589998222025, 0.358614863147, 2.970887395829, 4.39160430183, 0.111893402319, 0.140739944934, 0.003938671797, 0.001185537435, 1.045176603318, -0.037531801533, -1.744525005833, -0.972957942438, 0, ],
[-16.939142002537, 0.618053512295, 107.92089190414, 204.524147386814, 0.204407545189, 0.004742101706, 0.004247169746, -0.079444150933, -2.048456967261, -0.931989524708, -66.540858220883, -116.470289129818, -0.561301215495, -0.022312225275, -0.019484747345, 0.243518778973, 4.462098610572, -3.839389874682, 0.285714413078, 1.40736916669, 0.007176864388, 0.007131419303, -0.097503691021, -0.083171197416, 0.993922379938, -1.819432085819, -0.042893874898, -0.055015718216, 0, ],
[-0.542809857455, 7.081822285872, -135.012404429101, 460.929268260027, 0.036498617908, 0.006937238413, 0.006213200589, -0.116219147061, -0.827454697348, 19.622217613195, 78.553728334274, -283.23862765888, 3.065444785639, -0.003847616297, -0.028984525722, 0.187507140282, 2.220506417769, 0.245737625222, 3.99902408961, -2.362524402134, 0.124769923797, 0.142065016461, 0.007538727793, 0.004297097528, -0.129475392736, -0.050587718062, -1.998394759416, 1.072835822585, 0, ],
[-1.286456393795, 0.142279456389, -1.265748910581, 65.74306027738, -1.320702989799, -0.061855995532, -0.055400100872, 1.036269854556, -4.531489334771, 0.368539277612, 0.002487097952, -42.326462719738, 8.96223401238, 0.255676968878, 0.215513465742, -4.275436802385, -81.833676543035, 0.555500345288, 0.612894852362, -17.351836610113, -0.005925968725, -0.078209662789, 0.049750119549, 0.044979645917, -17.190711833803, -0.206830688253, -0.638945907467, 0.686150823668, 0, ],
[0, 0, 0, 0, 0, -0.009702263896, -0.008689641059, 0.162541456323, 0, 0, 0, 0, 0, 0, 0, 0, -0.012, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[-8.153162937544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.005, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[0, -3.261265175018, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.005, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[0, 0, 0, 0.17441246156, -3.261265175018, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
[0, 0, -3.261265175018, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -8.5, -18, 0, 0, 0, 0, 0, 0, 0, ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ],
[0, 0, 0, -8.153162937544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -8.5, -18, 0, 0, 0, 0, 0, ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ],
[0, 0, 0, 0, 0, 0, 0, 0, 0.699960862226, 0.262038222227, 0.159589891262, 0.41155156501, -1.701619176699, -0.0427567124, -0.038285155304, 0.703045934017, 16.975651534025, -0.115788018654, -0.127109026104, 3.599544290134, 0.001229743857, 0.016223661959, -0.01033400498, -0.00934235613, -6.433934989563, 0.042639567847, 0.132540852847, -0.142338323726, 0, ],
[0, 0, 0, 0, 0, 0, 0, 0, -37.001496211974, 0.783588795613, -0.183854784348, -11.869599790688, -0.106084318011, -0.026306590251, -0.027118088888, 0.036744952758, 0.76460150301, 7.002366574508, -0.390318898363, -0.642631203146, -0.005701671024, 0.003522251111, 0.173867535377, 0.147911422248, 0.056092715216, -6.641979472328, 0.039602243105, 0.026181724138, 0, ],
[0, 0, 0, 0, 0, 0, 0, 0, 1.991401999957, 13.760045912368, 2.53041689113, 0.082528789604, 0.728264862053, 0.023902766734, -0.022896554363, 0.015327568208, 0.370476566397, -0.412566245022, -6.70094564846, -1.327038338854, -0.227019235965, -0.267482033427, -0.008650986307, -0.003394359441, 0.098792645471, 0.197714179668, -6.369398456151, -0.011976840769, 0, ],
[0, 0, 0, 0, 0, 0, 0, 0, 1.965859332057, -3.743127938662, -1.962645156793, 0.018929412474, 11.145046656101, -0.03600197464, -0.001222148117, 0.602488409354, 11.639787952728, -0.407672972316, 1.507740702165, -12.799953897143, 0.005393102236, -0.014208764492, -0.000915158115, -0.000640326416, -0.03653528842, 0.012458973237, -0.083125038259, -5.472831842357, 0, ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ],
])
,
npmatrix([1.0 if d == 28 else 0.0 for d in xrange(29)])
)



if __name__ == "__main__":
main()

示例输出

这是演示问题的输出示例(每次运行都略有不同)。请注意执行时间的巨大变化(超过两个数量级!)。同样,如果我使用大小为 1 的池(或在没有池的情况下运行代码),或者如果我在调用 integrate 时不使用显式 jacobian,这一切都会消失。

  1. times: 5847ms, 5760ms
  2. times: 4177ms, 3991ms
  3. times: 229ms, 36ms
  4. times: 1317ms, 1544ms
  5. times: 87ms, 100ms
  6. times: 113ms, 102ms
  7. times: 4747ms, 5077ms
  8. times: 597ms, 48ms
  9. times: 9ms, 49ms
  10. times: 135ms, 109ms

最佳答案

根据您为显示问题的机器发布的执行时间的可变性,我想知道在您运行测试时该计算机还在做什么。以下是我在 AWS r3.large 服务器(2 个内核,15 GB RAM)上运行您的代码时看到的时间,该服务器通常运行交互式 R session 但目前大部分时间处于空闲状态:

  1. times: 11ms, 11ms
  2. times: 9ms, 9ms
  3. times: 9ms, 9ms
  4. times: 9ms, 9ms
  5. times: 10ms, 10ms
  6. times: 10ms, 10ms
  7. times: 10ms, 10ms
  8. times: 11ms, 10ms
  9. times: 11ms, 10ms
  10. times: 9ms, 9ms

有没有可能你的机器正在交换而你却不知道? vmstat 5 将为您提供大量有关换入和换出的信息,但不会提供有关缓存逐出的信息。

英特尔制作了一些非常好的监控工具——一次两个——处理器中发生的数千种不同类型的操作和错误——包括二级缓存逐出——但它们有点像水龙头:有是每微秒或更频繁地生成的信息,您必须决定要监视的内容以及希望中断以将数字传送到软件中的频率。可能需要多次运行才能缩小要跟踪的统计数据的范围,并且您仍然必须过滤掉操作系统产生的噪音以及当时运行的所有其他内容。这是一个耗时的过程,但如果您坚持到底并运行许多不同的测试,您就会明白发生了什么。

但是这个——处理器中的共享缓存资源——真的是你的问题吗?似乎您只想弄清楚为什么一台机器上的运行时间可变,其次,为什么多线程在两台机器上都比单线程慢。我说得对吗?如果没有,我会编辑我的答案,我们可以讨论处理器缓存、缓存监听和缓存一致性。

因此,关于 i7-2670QM CPU 机器上的可变性,我将从 htopvmstat 5iostat 5 开始查看机器是否正在做您没有意识到的事情。如此多的可变性表明可执行文件正在停止,因为处理器正忙于做其他事情:离开网络但找不到它期望的共享,无法连接到 DNS 服务器,出现 kerbios 故障:可能有很多事情包括来自不断重置的硬盘的硬件故障。哦,把你的程序移动到/dev/shm 并在你启动它之前 cd 到那里。如果磁盘上的坏地方有 Python 库,那对你没有帮助,但至少你的本地目录不会有问题。报告您的发现,我们可以提出进一步的建议。

在我看来,您的第二个问题(也许是您开始的地方)是为什么您的程序在多线程运行时比单线程运行慢。这是一个很大的主题,如果我们能看到您如何对其进行多线程处理,将会引起更多关注。但在我们这样做之前,您必须意识到有几件事会导致多线程程序比单线程程序运行得更慢,而且它可能与程序周围的支持基础设施——库有很大关系和操作系统调用你做的——作为你的程序。仅仅因为您不需要互斥量并不意味着库和操作系统在从多线程应用程序调用它们时不需要它们。锁定互斥锁是一项昂贵的操作,尤其是当不同的线程在不同的内核之间轮换时。

最重要的是,由于 vode 是不可重入的,如果您从多个线程调用它,它可能无法找到收敛性,并且在它“幸运”之前必须多次重新计算相同的值,并且在换出和覆盖中间结果之前,有足够的处理器时间来完成迭代。向我们提供您用于多线程运行的代码,我将添加到此答案中。

关于python - 在多处理池中运行 scipy.integrate.ode 会导致巨大的性能损失,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39350557/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com