gpt4 book ai didi

Python numpy 执行速度非常慢

转载 作者:太空宇宙 更新时间:2023-11-03 18:24:45 47 4
gpt4 key购买 nike

我正在尝试在 python 中实现隐马尔可夫模型训练,并且生成的 numpy 代码似乎非常慢。训练一个模型需要 30 分钟。下面是我的代码,我确实同意它的效率非常低。我尝试学习 numpy 矢量化和高级索引方法,但无法弄清楚如何在我的代码中使用它们。我可以确定大部分执行都是集中的,超过 99% 的执行时间都由 reestimate() 函数占用,尤其是打印 CHK5 和 CHK6 的部分。

    def reestimate(self):
newTransition = numpy.zeros(shape=(int(self.num_states),int(self.num_states)))
newOutput = numpy.zeros(shape=(int(self.num_states),int(self.num_symbols)))
numerator = numpy.zeros(shape=(int(self.num_obSeq),))
denominator = numpy.zeros(shape=(int(self.num_obSeq),))
sumP = 0
i = 0
print "CHK1"
while i < self.num_states:
j=0
while j < self.num_states:
if j < i or j > i + self.delta:
newTransition[i][j] = 0
else:
k=0
print "CHK2"
while k < self.num_obSeq:
numerator[k] = denominator[k] = 0
self.setObSeq(self.obSeq[k])

sumP += self.computeAlpha()
self.computeBeta()
t=0
while t < self.len_obSeq - 1:
numerator[k] += self.alpha[t][i] * self.transition[i][j] * self.output[j][self.currentSeq[t + 1]] * self.beta[t + 1][j]
denominator[k] += self.alpha[t][i] * self.beta[t][i]
t += 1
k += 1
denom=0
k=0
print "CHK3"
while k < self.num_obSeq:
newTransition[i,j] += (1 / sumP) * numerator[k]
denom += (1 / sumP) * denominator[k]
k += 1
newTransition[i,j] /= denom
newTransition[i,j] += self.MIN_PROBABILITY
j += 1
i += 1
sumP = 0
i = 0
print "CHK4"
while i < self.num_states:
j=0
while j < self.num_symbols:
k=0
while k < self.num_obSeq:
numerator[k] = denominator[k] = 0
self.setObSeq(self.obSeq[k])
# print self.obSeq[k]
sumP += self.computeAlpha()
self.computeBeta()
t=0
print "CHK5"
while t < self.len_obSeq - 1:
if self.currentSeq[t] == j:
numerator[k] += self.alpha[t,i] * self.beta[t,i]
denominator[k] += self.alpha[t,i] * self.beta[t,i]
t += 1
k += 1
denom=0
k=0
print "CHK6"
while k < self.num_obSeq:
newOutput[i,j] += (1 / sumP) * numerator[k]
denom += (1 / sumP) * denominator[k]
k += 1
newOutput[i,j] /= denom
newOutput[i,j] += self.MIN_PROBABILITY,
j += 1
i += 1
self.transition = newTransition
self.output = newOutput

def train(self):
i = 0
while i < 20:
self.reestimate()
print "reestimating....." ,i
i += 1

最佳答案

矢量化内部循环非常简单。这是代码第二部分的示例(当然未经测试):

print "CHK4"
for i in xrange(self.num_states):
for j in xrange(self.num_symbols):
for k in xrange(self.num_obSeq):
self.setObSeq(self.obSeq[k])
# print self.obSeq[k]
sumP += self.computeAlpha()
self.computeBeta()
alpha_times_beta = self.alpha[:,i] * self.beta[:,i]
numerator[k] = numpy.sum(alpha_times_beta[self.currentSeq == j])
denominator[k] = numpy.sum(alpha_times_beta)
denom = numpy.sum(denominator)
newOutput[i,j] = numpy.sum(numerator) / (sumP * denom) + self.MIN_PROBABILITY
self.transition = newTransition
self.output = newOutput

也可以对外部循环进行矢量化,但到目前为止,通常仅通过关注内部循环来获得最大的增益。一些评论:

  • 看来你的大部分while循环可以变成for循环。尽管这对速度没有太大影响,但如果您知道循环之前的迭代次数,那么这是首选方法。

  • 惯例是使用 import numpy as np ,并使用np.function在其余代码中

  • 仅计算总和 ( accum = 0; for item in vector: accum += item ) 的简单循环应像 accum = np.sum(vector) 一样进行矢量化.

  • 循环中的条件求和可以转换为带有 bool 索引的向量化求和,因此 accum = 0; for i in range(n): if cond[i]: accum += vector[i]可以替换为accum = np.sum(vector[cond])

我很想知道这些修改后您的代码速度会提高多少,我想您可以轻松获得超过 10 倍的速度。

关于Python numpy 执行速度非常慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23422274/

47 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com