gpt4 book ai didi

python - 为什么简单梯度下降会发散?

转载 作者:太空宇宙 更新时间:2023-11-03 10:55:41 25 4
gpt4 key购买 nike

这是我第二次尝试在一个变量中实现梯度下降,但它总是发散。有什么想法吗?

这是一个简单的线性回归,用于最小化一个变量中的残差平方和。

def gradient_descent_wtf(xvalues, yvalues):
tolerance = 0.1

#y=mx+b
#some line to predict y values from x values
m=1.
b=1.

#a predicted y-value has value mx + b

for i in range(0,10):

#calculate y-value predictions for all x-values
predicted_yvalues = list()
for x in xvalues:
predicted_yvalues.append(m*x + b)

# predicted_yvalues holds the predicted y-values

#now calculate the residuals = y-value - predicted y-value for each point
residuals = list()
number_of_points = len(yvalues)
for n in range(0,number_of_points):
residuals.append(yvalues[n] - predicted_yvalues[n])

## calculate the residual sum of squares from the residuals, that is,
## square each residual and add them all up. we will try to minimize
## the residual sum of squares later.
residual_sum_of_squares = 0.
for r in residuals:
residual_sum_of_squares += r**2
print("RSS = %s" % residual_sum_of_squares)
##
##
##

#now make a version of the residuals which is multiplied by the x-values
residuals_times_xvalues = list()
for n in range(0,number_of_points):
residuals_times_xvalues.append(residuals[n] * xvalues[n])

#now create the sums for the residuals and for the residuals times the x-values
residuals_sum = sum(residuals)

residuals_times_xvalues_sum = sum(residuals_times_xvalues)

# now multiply the sums by a positive scalar and add each to m and b.

residuals_sum *= 0.1
residuals_times_xvalues_sum *= 0.1

b += residuals_sum
m += residuals_times_xvalues_sum

#and repeat until convergence.
#convergence occurs when ||sum vector|| < some tolerance.
# ||sum vector|| = sqrt( residuals_sum**2 + residuals_times_xvalues_sum**2 )

#check for convergence
magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
if magnitude_of_sum_vector < tolerance:
break

return (b, m)

结果:

gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567])
RSS = 370433.0
RSS = 300170125.7
RSS = 4.86943013045e+11
RSS = 7.90447409339e+14
RSS = 1.28312217794e+18
RSS = 2.08287421094e+21
RSS = 3.38110045417e+24
RSS = 5.48849288217e+27
RSS = 8.90939341376e+30
RSS = 1.44624932026e+34
Out[108]:
(-3.475524066284303e+16, -2.4195981188763203e+17)

最佳答案

梯度很大——因此您在长距离内跟随大矢量(大数的 0.1 倍是大的)。找到适当方向的单位向量。像这样的东西(用理解代替你的循环):

def gradient_descent_wtf(xvalues, yvalues):
tolerance = 0.1

m=1.
b=1.

for i in range(0,10):
predicted_yvalues = [m*x+b for x in xvalues]

residuals = [y-y_hat for y,y_hat in zip(yvalues,predicted_yvalues)]

residual_sum_of_squares = sum(r**2 for r in residuals) #only needed for debugging purposes
print("RSS = %s" % residual_sum_of_squares)

residuals_times_xvalues = [r*x for r,x in zip(residuals,xvalues)]

residuals_sum = sum(residuals)

residuals_times_xvalues_sum = sum(residuals_times_xvalues)

# (residuals_sum,residual_times_xvalues_sum) is a vector which points in the negative
# gradient direction. *Find a unit vector which points in same direction*

magnitude = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5

residuals_sum /= magnitude
residuals_times_xvalues_sum /= magnitude

b += residuals_sum * (0.1)
m += residuals_times_xvalues_sum * (0.1)

#check for convergence -- this needs work!
magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
if magnitude_of_sum_vector < tolerance:
break

return (b, m)

例如:

>>> gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567])
RSS = 370433.0
RSS = 368732.1655050716
RSS = 367039.18363896786
RSS = 365354.0543519137
RSS = 363676.7775934381
RSS = 362007.3533123621
RSS = 360345.7814567845
RSS = 358692.061974069
RSS = 357046.1948108295
RSS = 355408.17991291644
(1.1157111313023558, 1.9932828425473605)

这当然更合理。

制作数值稳定的梯度下降算法并非易事。您可能想查阅一本不错的数值分析教科书。

关于python - 为什么简单梯度下降会发散?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41354177/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com