gpt4 book ai didi

python - Relu 性能比 sigmoid 差?

转载 作者:行者123 更新时间:2023-11-30 09:51:28 29 4
gpt4 key购买 nike

我在所有层和输出上使用 sigmoid,得到的最终错误率为 0.00012,但是当我使用理论上更好的 Relu 时,我得到了最差的结果。谁能解释为什么会发生这种情况?我正在使用一个非常简单的 2 层实现代码,可在 100 个网站上使用,但仍然在下面给出,

import numpy as np
#test
#avg(nonlin(np.dot(nonlin(np.dot([0,0,1],syn0)),syn1)))
#returns list >> [predicted_output, confidence]
def nonlin(x,deriv=False):#Sigmoid
if(deriv==True):
return x*(1-x)

return 1/(1+np.exp(-x))

def relu(x, deriv=False):#RELU
if (deriv == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
pass # do nothing since it would be effectively replacing x with x
else:
x[i][k] = 0
return x

X = np.array([[0,0,1],
[0,0,0],
[0,1,1],
[1,0,1],
[1,0,0],
[0,1,0]])

y = np.array([[0],[1],[0],[0],[1],[1]])

np.random.seed(1)

# randomly initialize our weights with mean 0
syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1

def avg(i):
if i > 0.5:
confidence = i
return [1,float(confidence)]
else:
confidence=1.0-float(i)
return [0,confidence]
for j in xrange(500000):

# Feed forward through layers 0, 1, and 2
l0 = X
l1 = nonlin(np.dot(l0,syn0Performing))
l2 = nonlin(np.dot(l1,syn1))
#print 'this is',l2,'\n'
# how much did we miss the target value?
l2_error = y - l2
#print l2_error,'\n'
if (j% 100000) == 0:
print "Error:" + str(np.mean(np.abs(l2_error)))
print syn1

# in what direction is the target value?
# were we really sure? if so, don't change too much.
l2_delta = l2_error*nonlin(l2,deriv=True)

# how much did each l1 value contribute to the l2 error (according to the weights)?
l1_error = l2_delta.dot(syn1.T)

# in what direction is the target l1?
# were we really sure? if so, don't change too much.
l1_delta = l1_error * nonlin(l1,deriv=True)

syn1 += l1.T.dot(l2_delta)
syn0 += l0.T.dot(l1_delta)
print "Final Error:" + str(np.mean(np.abs(l2_error)))
def p(l):
return avg(nonlin(np.dot(nonlin(np.dot(l,syn0)),syn1)))

因此 p(x) 是训练后的预测函数,其中 x 是输入值的 1 x 3 矩阵。

最佳答案

为什么说理论上更好呢?在大多数应用中,ReLU 已被证明更好,但这并不意味着它普遍更好。您的示例非常简单,输入在 [0,1] 之间缩放,与输出相同。这正是我期望 sigmoid 表现良好的地方。由于梯度消失问题和大型网络的其他一些问题,您在实践中不会在隐藏层中遇到 sigmoid,但这对您来说几乎不是问题。

此外,如果您万一使用了 ReLU 导数,您的代码中就缺少了“else”。您的导数将被简单地覆盖。

作为复习,这里是 ReLU 的定义:

f(x)=max(0,x)

...这意味着它可以将你的激活值无限增加。您希望避免在最后(输出)层使用 ReLU。

顺便说一句,只要有可能,您就应该利用矢量化操作:

def relu(x, deriv=False):#RELU
if (deriv == True):
mask = x > 0
x[mask] = 1
x[~mask] = 0
else: # HERE YOU WERE MISSING "ELSE"
return np.maximum(0,x)

是的,它比 if/else 你正在做的要快得多。

关于python - Relu 性能比 sigmoid 差?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44351395/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com