gpt4 book ai didi

machine-learning - Keras - 具有经常丢失的 GRU 层 - 损失 : 'nan' , 准确度:0

转载 作者:行者123 更新时间:2023-12-03 23:48:44 27 4
gpt4 key购买 nike

问题描述

我正在阅读 François Chollet ( publisher webpage , notebooks on github ) 的“Python 中的深度学习”。复制第 6 章中的示例,我遇到了(我相信)GRU 层经常丢失的问题。

我第一次观察到这些错误的代码很长,所以我决定坚持最简单的问题,它可以复制错误:将 IMDB 评论分为“正面”和“负面”类别。

当我使用具有经常性 dropout 训练损失的 GRU 层时(在第一个 epoch 的几批之后)取 nan 的“值”,而训练准确度(从第二个 epoch 开始)取值为 0。

   64/12000 [..............................] - ETA: 3:05 - loss: 0.6930 - accuracy: 0.4844
128/12000 [..............................] - ETA: 2:09 - loss: 0.6926 - accuracy: 0.4766
192/12000 [..............................] - ETA: 1:50 - loss: 0.6910 - accuracy: 0.5573
(...)
3136/12000 [======>.......................] - ETA: 59s - loss: 0.6870 - accuracy: 0.5635
3200/12000 [=======>......................] - ETA: 58s - loss: 0.6862 - accuracy: 0.5650
3264/12000 [=======>......................] - ETA: 58s - loss: 0.6860 - accuracy: 0.5650
3328/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5667
3392/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5560
3456/12000 [=======>......................] - ETA: 56s - loss: nan - accuracy: 0.5457
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.1593
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1584
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1576
12000/12000 [==============================] - 83s 7ms/step - loss: nan - accuracy: 0.1572 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/20

64/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
128/12000 [..............................] - ETA: 1:15 - loss: nan - accuracy: 0.0000e+00
192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.0000e+00
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
12000/12000 [==============================] - 82s 7ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/20

64/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
128/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)

定位问题

为了找出解决方案,我编写了下面给出的代码,该代码通过了几个模型(GRU/LSTM,{无 dropout,只有“正常”dropout,只有经常性 dropout,“正常”和经常性 dropout,rmsprop/adam})并呈现所有这些模型的损失和准确性。 (它还为每个模型创建更小、独立的图表。)
# Based on examples from "Deep Learning with Python" by François Chollet:
## Constants, modules:
VERSION = 2

import os
from keras import models
from keras import layers
import matplotlib.pyplot as plt
import pylab

## Loading data:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = \
imdb.load_data(num_words=10000)

from keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen=500)
x_test = sequence.pad_sequences(x_test, maxlen=500)


## Dictionary with models' hyperparameters:
MODELS = [
# GRU:
{"no": 1,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},

{"no": 2,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},

{"no": 3,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},

{"no": 4,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},

{"no": 5,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},

{"no": 6,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},

{"no": 7,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},

{"no": 8,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},

# LSTM:
{"no": 9,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},

{"no": 10,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},

{"no": 11,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},

{"no": 12,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},

{"no": 13,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},

{"no": 14,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},

{"no": 15,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},

{"no": 16,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
]

## Adding name:
for model_dict in MODELS:
model_dict["name"] = f"{model_dict['layer_type']}"
model_dict["name"] += f"_d{model_dict['dropout']}" if model_dict['dropout'] is not None else f"_dN"
model_dict["name"] += f"_rd{model_dict['recurrent_dropout']}" if model_dict['recurrent_dropout'] is not None else f"_rdN"
model_dict["name"] += f"_{model_dict['optimizer']}"

## Fucntion - defing and training model:
def train_model(model_dict):
"""Defines and trains a model, outputs history."""

## Defining:
model = models.Sequential()
model.add(layers.Embedding(10000, 32))

recurrent_layer_kwargs = dict()
if model_dict["dropout"] is not None:
recurrent_layer_kwargs["dropout"] = model_dict["dropout"]
if model_dict["recurrent_dropout"] is not None:
recurrent_layer_kwargs["recurrent_dropout"] = model_dict["recurrent_dropout"]

if model_dict["layer_type"] == 'GRU':
model.add(layers.GRU(32, **recurrent_layer_kwargs))
elif model_dict["layer_type"] == 'LSTM':
model.add(layers.LSTM(32, **recurrent_layer_kwargs))
else:
raise ValueError("Wrong model_dict['layer_type'] value...")
model.add(layers.Dense(1, activation='sigmoid'))

## Compiling:
model.compile(
optimizer=model_dict["optimizer"],
loss='binary_crossentropy',
metrics=['accuracy'])

## Training:
history = model.fit(x_train, y_train,
epochs=20,
batch_size=64,
validation_split=0.2)

return history

## Multi-model graphs' parameters:
graph_all_nrow = 4
graph_all_ncol = 4
graph_all_figsize = (20, 20)

assert graph_all_nrow * graph_all_nrow >= len(MODELS)

## Figs and axes of multi-model graphs:
graph_all_loss_fig, graph_all_loss_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
graph_all_acc_fig, graph_all_acc_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)

## Loop trough all models:
for i, model_dict in enumerate(MODELS):
history = train_model(model_dict)

## Metrics extraction:
loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

epochs = range(1, len(loss) + 1)

## Single-model grph - loss:
graph_loss_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_loss_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_loss_graph.png"

graph_loss_fig, graph_loss_ax = plt.subplots()
graph_loss_ax.plot(epochs, loss, 'bo', label='Training loss')
graph_loss_ax.plot(epochs, val_loss, 'b', label='Validation loss')
graph_loss_ax.legend()
graph_loss_fig.suptitle("Training and validation loss")
graph_loss_fig.savefig(graph_loss_fname)
pylab.close(graph_loss_fig)


## Single-model grph - accuracy:
graph_acc_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_acc_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_acc_graph.png"

graph_acc_fig, graph_acc_ax = plt.subplots()
graph_acc_ax.plot(epochs, acc, 'bo', label='Training accuracy')
graph_acc_ax.plot(epochs, val_acc, 'b', label='Validation accuracy')
graph_acc_ax.legend()
graph_acc_fig.suptitle("Training and validation acc")
graph_acc_fig.savefig(graph_acc_fname)
pylab.close(graph_acc_fig)

## Position of axes on multi-model graph:
i_row = i // graph_all_ncol
i_col = i % graph_all_ncol

## Adding model metrics to multi-model graph - loss:
graph_all_loss_axs[i_row, i_col].plot(epochs, loss, 'bo', label='Training loss')
graph_all_loss_axs[i_row, i_col].plot(epochs, val_loss, 'b', label='Validation loss')
graph_all_loss_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")

## Adding model metrics to multi-model graph - accuracy:
graph_all_acc_axs[i_row, i_col].plot(epochs, acc, 'bo', label='Training acc')
graph_all_acc_axs[i_row, i_col].plot(epochs, val_acc, 'b', label='Validation acc')
graph_all_acc_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")


## Saving multi-model graphs:
# Output files are quite big (8000x8000 PNG), you may want to decrease DPI.
graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_loss_graph.png", dpi=400)
graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_acc_graph.png", dpi=400)

请在下面找到两个主要图表: Loss - binary crossentropyAccuracy(由于声誉低,我不允许在帖子中嵌入图像)。

我在回归模型中也遇到了类似的奇怪问题——MAE 在几千的范围内——在 $y$ 范围可能是几十的问题中。 (我决定不在这里包含这个模型,因为它会使这个问题变得更长。)

模块和库、硬件的版本
  • 模块:
  • Keras                    2.3.1
    Keras-Applications 1.0.8
    Keras-Preprocessing 1.1.0
    matplotlib 3.1.3
    tensorflow-estimator 1.14.0
    tensorflow-gpu 2.1.0
    tensorflow-gpu-estimator 2.1.0
  • keras.json 文件:
  • {
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
    }
  • CUDA - 我的系统上安装了 CUDA 10.0 和 CUDA 10.1。
  • CUDnn - 我有三个版本:cudnn-10.0 v7.4.2.24、cudnn-10.0 v7.6.4.38、cudnn-9.0 v7.4.2.24
  • GPU:Nvidia GTX 1050Ti 4gb
  • Windows 10 家庭版

  • 问题
  • 你知道这种行为可能是什么原因吗?
  • 这可能是由多个 CUDA 和 CUDnn 安装引起的吗?在观察问题之前,我已经训练了几个模型(来自书本和我自己的模型)并且似乎表现得或多或少符合预期,同时有 2 个 CUDA 和 2 个 CUDnn 版本(上面没有 cudnn-10.0 v7.6.4.38 的那些)安装。
  • 是否有任何官方/良好的 keras、tensorflow、CUDA、CUDnn(以及其他相关的东西,例如可能是 Visual Studio)的适当组合来源?我真的找不到任何权威和最新的来源。

  • 我希望我已经足够清楚地描述了一切。如果您有任何疑问,请询问。

    最佳答案

    我终于找到了解决方案(有点)。改变就够了kerastensorflow.keras .

    修订代码

    # Based on examples from "Deep Learning with Python" by François Chollet:
    ## Constants, modules:
    VERSION = 2

    import os
    #U: from keras import models
    #U: from keras import layers
    from tensorflow.keras import models
    from tensorflow.keras import layers

    import matplotlib.pyplot as plt
    import pylab

    ## Loading data:
    from keras.datasets import imdb

    (x_train, y_train), (x_test, y_test) = \
    imdb.load_data(num_words=10000)

    from keras.preprocessing import sequence

    x_train = sequence.pad_sequences(x_train, maxlen=500)
    x_test = sequence.pad_sequences(x_test, maxlen=500)

    ## Dictionary with models' hyperparameters:
    MODELS_ALL = [
    # GRU:
    {"no": 1,
    "layer_type": "GRU",
    "optimizer": "rmsprop",
    "dropout": None,
    "recurrent_dropout": None},

    {"no": 2,
    "layer_type": "GRU",
    "optimizer": "rmsprop",
    "dropout": 0.3,
    "recurrent_dropout": None},

    {"no": 3,
    "layer_type": "GRU",
    "optimizer": "rmsprop",
    "dropout": None,
    "recurrent_dropout": 0.3},

    {"no": 4,
    "layer_type": "GRU",
    "optimizer": "rmsprop",
    "dropout": 0.3,
    "recurrent_dropout": 0.3},

    {"no": 5,
    "layer_type": "GRU",
    "optimizer": "adam",
    "dropout": None,
    "recurrent_dropout": None},

    {"no": 6,
    "layer_type": "GRU",
    "optimizer": "adam",
    "dropout": 0.3,
    "recurrent_dropout": None},

    {"no": 7,
    "layer_type": "GRU",
    "optimizer": "adam",
    "dropout": None,
    "recurrent_dropout": 0.3},

    {"no": 8,
    "layer_type": "GRU",
    "optimizer": "adam",
    "dropout": 0.3,
    "recurrent_dropout": 0.3},

    # LSTM:
    {"no": 9,
    "layer_type": "LSTM",
    "optimizer": "rmsprop",
    "dropout": None,
    "recurrent_dropout": None},

    {"no": 10,
    "layer_type": "LSTM",
    "optimizer": "rmsprop",
    "dropout": 0.3,
    "recurrent_dropout": None},

    {"no": 11,
    "layer_type": "LSTM",
    "optimizer": "rmsprop",
    "dropout": None,
    "recurrent_dropout": 0.3},

    {"no": 12,
    "layer_type": "LSTM",
    "optimizer": "rmsprop",
    "dropout": 0.3,
    "recurrent_dropout": 0.3},

    {"no": 13,
    "layer_type": "LSTM",
    "optimizer": "adam",
    "dropout": None,
    "recurrent_dropout": None},

    {"no": 14,
    "layer_type": "LSTM",
    "optimizer": "adam",
    "dropout": 0.3,
    "recurrent_dropout": None},

    {"no": 15,
    "layer_type": "LSTM",
    "optimizer": "adam",
    "dropout": None,
    "recurrent_dropout": 0.3},

    {"no": 16,
    "layer_type": "LSTM",
    "optimizer": "adam",
    "dropout": 0.3,
    "recurrent_dropout": 0.3},
    ]

    MODELS_GRU_RECCURENT = [
    # GRU:
    {"no": 3,
    "layer_type": "GRU",
    "optimizer": "rmsprop",
    "dropout": None,
    "recurrent_dropout": 0.3},

    {"no": 4,
    "layer_type": "GRU",
    "optimizer": "rmsprop",
    "dropout": 0.3,
    "recurrent_dropout": 0.3},

    {"no": 7,
    "layer_type": "GRU",
    "optimizer": "adam",
    "dropout": None,
    "recurrent_dropout": 0.3},

    {"no": 8,
    "layer_type": "GRU",
    "optimizer": "adam",
    "dropout": 0.3,
    "recurrent_dropout": 0.3},
    ]

    MODELS = MODELS_ALL # "MODELS = MODELS_ALL" or "MODELS = MODELS_GRU_RECCURENT"

    ## Adding name:
    for model_dict in MODELS:
    model_dict["name"] = f"{model_dict['layer_type']}"
    model_dict["name"] += f"_d{model_dict['dropout']}" if model_dict['dropout'] is not None else f"_dN"
    model_dict["name"] += f"_rd{model_dict['recurrent_dropout']}" if model_dict['recurrent_dropout'] is not None else f"_rdN"
    model_dict["name"] += f"_{model_dict['optimizer']}"


    ## Fucntion - defing and training model:
    def train_model(model_dict):
    """Defines and trains a model, outputs history."""

    ## Defining:
    model = models.Sequential()
    model.add(layers.Embedding(10000, 32))

    recurrent_layer_kwargs = dict()
    if model_dict["dropout"] is not None:
    recurrent_layer_kwargs["dropout"] = model_dict["dropout"]
    if model_dict["recurrent_dropout"] is not None:
    recurrent_layer_kwargs["recurrent_dropout"] = model_dict["recurrent_dropout"]

    if model_dict["layer_type"] == 'GRU':
    model.add(layers.GRU(32, **recurrent_layer_kwargs))
    elif model_dict["layer_type"] == 'LSTM':
    model.add(layers.LSTM(32, **recurrent_layer_kwargs))
    else:
    raise ValueError("Wrong model_dict['layer_type'] value...")
    model.add(layers.Dense(1, activation='sigmoid'))

    ## Compiling:
    model.compile(
    optimizer=model_dict["optimizer"],
    loss='binary_crossentropy',
    metrics=['accuracy'])

    ## Training:
    history = model.fit(x_train, y_train,
    epochs=20,
    batch_size=64,
    validation_split=0.2)

    return history


    ## Multi-model graphs' parameters:
    graph_all_nrow = 4
    graph_all_ncol = 4
    graph_all_figsize = (20, 20)

    assert graph_all_nrow * graph_all_nrow >= len(MODELS)

    # fig and axes of multi-model graphs:
    graph_all_loss_fig, graph_all_loss_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
    graph_all_acc_fig, graph_all_acc_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)

    ## Loop trough all models:
    for i, model_dict in enumerate(MODELS):
    history = train_model(model_dict)

    ## Metrics extraction:
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']

    epochs = range(1, len(loss) + 1)

    ## Single-model graph - loss:
    graph_loss_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
    graph_loss_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_loss_graph.png"

    graph_loss_fig, graph_loss_ax = plt.subplots()
    graph_loss_ax.plot(epochs, loss, 'bo', label='Training loss')
    graph_loss_ax.plot(epochs, val_loss, 'b', label='Validation loss')
    graph_loss_ax.legend()
    graph_loss_fig.suptitle("Training and validation loss")
    graph_loss_fig.savefig(graph_loss_fname)
    pylab.close(graph_loss_fig)

    ## Single-model graph - accuracy:
    graph_acc_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
    graph_acc_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_acc_graph.png"

    graph_acc_fig, graph_acc_ax = plt.subplots()
    graph_acc_ax.plot(epochs, acc, 'bo', label='Training accuracy')
    graph_acc_ax.plot(epochs, val_acc, 'b', label='Validation accuracy')
    graph_acc_ax.legend()
    graph_acc_fig.suptitle("Training and validation acc")
    graph_acc_fig.savefig(graph_acc_fname)
    pylab.close(graph_acc_fig)

    ## Position of axes on multi-model graph:
    i_row = i // graph_all_ncol
    i_col = i % graph_all_ncol

    ## Adding model metrics to multi-model graph - loss:
    graph_all_loss_axs[i_row, i_col].plot(epochs, loss, 'bo', label='Training loss')
    graph_all_loss_axs[i_row, i_col].plot(epochs, val_loss, 'b', label='Validation loss')
    graph_all_loss_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")

    ## Adding model metrics to multi-model graph - accuracy:
    graph_all_acc_axs[i_row, i_col].plot(epochs, acc, 'bo', label='Training acc')
    graph_all_acc_axs[i_row, i_col].plot(epochs, val_acc, 'b', label='Validation acc')
    graph_all_acc_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")

    graph_all_loss_fig.suptitle(f"Loss - binary crossentropy [v{VERSION}]")
    graph_all_acc_fig.suptitle(f"Accuracy [v{VERSION}]")

    ## Saving multi-model graphs:
    graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_loss_graph.png", dpi=400)
    graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_acc_graph.png", dpi=400)

    ## Saving multi-model graphs (SMALL):
    graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_loss_graph_SMALL.png", dpi=150)
    graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_v{VERSION}_acc_graph_SMALL.png", dpi=150)

    结果

    与问题类似的图表: Loss - binary crossentropy , Accuracy

    更多关于 keras对比 tensorflow.keras
    正如 François Chollet 所写的 tweets (在这里找到: https://stackoverflow.com/a/54117754)而不是独立的 keras会有 tensorflow.keras (即 Keras 作为 TensorFlow 的官方 API)从现在开始。 (我不完全确定我是否 100% 正确,请随时纠正我。)

    我认为最好只使用 tensorflow.keras而不是 keras在 future 的项目中。

    关于machine-learning - Keras - 具有经常丢失的 GRU 层 - 损失 : 'nan' , 准确度:0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60797725/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com