gpt4 book ai didi

python - 为什么自定义激活函数会导致网络既零损失又低准确率?

转载 作者:太空宇宙 更新时间:2023-11-04 04:57:23 27 4
gpt4 key购买 nike

我试图通过进行以下更改来使用 tflearn 构建自定义激活函数:

将我的自定义激活函数添加到 activation.py

def my_activation(x):
return tf.where(x >= 0.0, tf.div( x**2 , x + tf.constant(0.6)) , 0.01*x)

并将其添加到 __init__.py

from .activations import linear, tanh, sigmoid, softmax, softplus, softsign,\
relu, relu6, leaky_relu, prelu, elu, crelu, selu, my_activation

因为tensorflow可以自动进行梯度计算,所以我不需要实现gradiate函数。正如文章中指出的那样 Deep Learning Programming Style ,

In the past, whenever someone defined a new model, they had to work out the derivative calculations by hand. While the math is reasonably straightforward, for complex models, it can be time-consuming and tedious work. All modern deep learning libraries make the practitioner/researcher’s job much easier, by automatically solving the problem of gradient calculation.

我使用以下代码在 cifar10 数据集上训练模型:https://github.com/tflearn/tflearn/blob/master/examples/images/convnet_cifar10.py但将所有 relu 激活更改为 my_activation

遗憾的是,这个简单的修改导致网络无法学习任何东西:

Training Step: 46  | total loss: 0.00002 | time: 1.434s
| Adam | epoch: 001 | loss: 0.00002 - acc: 0.0885 -- iter: 04416/50000
Training Step: 47 | total loss: 0.00002 | time: 1.448s
| Adam | epoch: 001 | loss: 0.00002 - acc: 0.0945 -- iter: 04512/50000
Training Step: 48 | total loss: 0.00001 | time: 1.462s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0927 -- iter: 04608/50000
Training Step: 49 | total loss: 0.00001 | time: 1.476s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0896 -- iter: 04704/50000
Training Step: 50 | total loss: 0.00001 | time: 1.491s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0919 -- iter: 04800/50000
Training Step: 51 | total loss: 0.00001 | time: 1.504s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0890 -- iter: 04896/50000
Training Step: 52 | total loss: 0.00001 | time: 1.518s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0944 -- iter: 04992/50000
Training Step: 53 | total loss: 0.00001 | time: 1.539s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0989 -- iter: 05088/50000
Training Step: 54 | total loss: 0.00001 | time: 1.553s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0951 -- iter: 05184/50000
Training Step: 55 | total loss: 0.00000 | time: 1.567s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0964 -- iter: 05280/50000
Training Step: 56 | total loss: 0.00000 | time: 1.580s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0931 -- iter: 05376/50000
Training Step: 57 | total loss: 0.00000 | time: 1.594s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0903 -- iter: 05472/50000
Training Step: 58 | total loss: 0.00000 | time: 1.613s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0851 -- iter: 05568/50000
Training Step: 59 | total loss: 0.00000 | time: 1.641s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0835 -- iter: 05664/50000
Training Step: 60 | total loss: 0.00000 | time: 1.674s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0834 -- iter: 05760/50000

由于我只是一个初学者,我不知道导致网络变得既零损失又低准确性的原因(NaN输出?自重?)。谁能告诉我如何解决这个问题?谢谢!

请注意,我不是在问如何构建自定义激活函数。关于如何构建自定义函数的问题:

最佳答案

Why does a custom activation function cause network both zero loss and low accuracy?

因为此网络不会通过您的新激活进行反向传播。您所做的只是创建自定义激活函数的开始。参见 this question :“......正如上面提到的来源所解释的那样,有一种黑客可以使用 tf.RegisterGradienttf.Graph.gradient_override_map 来定义函数的梯度...... ".

我其实不确定你的激活比tflearn.activations.leaky_relu好多少,但如果您真的想提供自定义激活,则必须对渐变进行编码并按照上述方式进行注册。

关于python - 为什么自定义激活函数会导致网络既零损失又低准确率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46742016/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com