gpt4 book ai didi

python - 解释深度神经网络的训练轨迹 : very low training loss and even lower validation loss

转载 作者:太空宇宙 更新时间:2023-11-04 07:59:57 27 4
gpt4 key购买 nike

我对以下日志有点怀疑,这是我在训练深度神经网络时得到的回归目标值在 -1.0 和 1.0 之间,学习率为 0.001 和 19200/4800 训练/验证样本:

____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
cropping2d_1 (Cropping2D) (None, 138, 320, 3) 0 cropping2d_input_1[0][0]
____________________________________________________________________________________________________
lambda_1 (Lambda) (None, 66, 200, 3) 0 cropping2d_1[0][0]
____________________________________________________________________________________________________
lambda_2 (Lambda) (None, 66, 200, 3) 0 lambda_1[0][0]
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D) (None, 31, 98, 24) 1824 lambda_2[0][0]
____________________________________________________________________________________________________
spatialdropout2d_1 (SpatialDropo (None, 31, 98, 24) 0 convolution2d_1[0][0]
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 14, 47, 36) 21636 spatialdropout2d_1[0][0]
____________________________________________________________________________________________________
spatialdropout2d_2 (SpatialDropo (None, 14, 47, 36) 0 convolution2d_2[0][0]
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D) (None, 5, 22, 48) 43248 spatialdropout2d_2[0][0]
____________________________________________________________________________________________________
spatialdropout2d_3 (SpatialDropo (None, 5, 22, 48) 0 convolution2d_3[0][0]
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D) (None, 3, 20, 64) 27712 spatialdropout2d_3[0][0]
____________________________________________________________________________________________________
spatialdropout2d_4 (SpatialDropo (None, 3, 20, 64) 0 convolution2d_4[0][0]
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D) (None, 1, 18, 64) 36928 spatialdropout2d_4[0][0]
____________________________________________________________________________________________________
spatialdropout2d_5 (SpatialDropo (None, 1, 18, 64) 0 convolution2d_5[0][0]
____________________________________________________________________________________________________
flatten_1 (Flatten) (None, 1152) 0 spatialdropout2d_5[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 1152) 0 flatten_1[0][0]
____________________________________________________________________________________________________
activation_1 (Activation) (None, 1152) 0 dropout_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 100) 115300 activation_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 100) 0 dense_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 50) 5050 dropout_2[0][0]
____________________________________________________________________________________________________
dense_3 (Dense) (None, 10) 510 dense_2[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (None, 10) 0 dense_3[0][0]
____________________________________________________________________________________________________
dense_4 (Dense) (None, 1) 11 dropout_3[0][0]
====================================================================================================
Total params: 252,219
Trainable params: 252,219
Non-trainable params: 0
____________________________________________________________________________________________________
None
Epoch 1/5
19200/19200 [==============================] - 795s - loss: 0.0292 - val_loss: 0.0128
Epoch 2/5
19200/19200 [==============================] - 754s - loss: 0.0169 - val_loss: 0.0120
Epoch 3/5
19200/19200 [==============================] - 753s - loss: 0.0161 - val_loss: 0.0114
Epoch 4/5
19200/19200 [==============================] - 723s - loss: 0.0154 - val_loss: 0.0100
Epoch 5/5
19200/19200 [==============================] - 1597s - loss: 0.0151 - val_loss: 0.0098

两者都减少了训练验证损失,乍一看这是个好消息。但是在第一个时期训练损失怎么会这么低呢?验证损失如何才能更低?这是否表明我的模型或训练设置中存在系统错误?

最佳答案

实际上 - 小于训练损失的验证损失并不像人们想象的那样罕见。它可能会发生,例如当验证数据中的所有示例都被您的训练集中的示例覆盖并且您的网络只是学习了数据集的实际结构时。

当您的数据结构不是很复杂时,这种情况经常发生。实际上 - 第一个纪元后损失的小值让您感到惊讶,这可能是您的案例中发生过这种情况的线索。

就损失而言,你没有具体说明你的损失是什么,但假设你的任务是回归 - 我猜它是 mse - 在这种情况下是均方误差在 0.01 级别意味着真实值和实际值之间的平均欧氏距离等于 0.1 什么是 5%您的值的直径集 [-1, 1]。那么——这个错误真的这么小吗?

您还没有指定在一个时期内分析的批处理数。也许如果您的数据结构不是那么复杂并且批处理大小很小 - 一个 epoch 就足以很好地学习您的数据。

为了检查您的模型是否训练良好,我建议您在绘制 y_pred 时绘制一个相关图,例如X 轴和 Y 轴上的 y_true。然后您将实际看到您的模型是如何训练的。

编辑:正如 Neil 提到的那样 - 小的验证错误背后可能有更多原因 - 比如没有很好地分离案例。我还要补充 - 因为这个事实 - 5 个纪元不超过 90 分钟 - 也许最好通过使用经典的交叉验证模式来检查模型的结果,例如5折。这将向您保证,对于您的数据集,您的模型表现良好。

关于python - 解释深度神经网络的训练轨迹 : very low training loss and even lower validation loss,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41909369/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com