gpt4 book ai didi

python - Tensorflow - 损失增加到 NaN

转载 作者:太空宇宙 更新时间:2023-11-04 08:38:03 25 4
gpt4 key购买 nike

我正在学习 Udacity 的深度学习类(class)。我观察到的有趣的事情是,对于相同的数据集,我的 1 层神经网络工作得很好,但是当我添加更多层时,我的损失增加到 NaN。

我正在使用以下博客文章作为引用:我正在使用以下博客文章作为引用:http://www.ritchieng.com/machine-learning/deep-learning/tensorflow/regularization/

这是我的代码:

batch_size = 128
beta = 1e-3

# Network Parameters
n_hidden_1 = 1024 # 1st layer number of neurons
n_hidden_2 = 512 # 2nd layer number of neurons

graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)


# Variables.
w1 = tf.Variable(tf.truncated_normal([image_size * image_size, n_hidden_1]))
w2 = tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2],stddev=math.sqrt(2.0/n_hidden_1)))
w3 = tf.Variable(tf.truncated_normal([n_hidden_2, num_labels],stddev=math.sqrt(2.0/n_hidden_2)))

b1 = tf.Variable(tf.zeros([n_hidden_1]))
b2 = tf.Variable(tf.zeros([n_hidden_2]))
b3 = tf.Variable(tf.zeros([num_labels]))

# Learning rate decay configs
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.5

# Training computation.
logits_1 = tf.matmul(tf_train_dataset, w1) + b1
hidden_layer_1 = tf.nn.relu(logits_1)
layer_1_dropout = tf.nn.dropout(hidden_layer_1, keep_prob)

logits_2 = tf.matmul(layer_1_dropout, w2) + b2
hidden_layer_2 = tf.nn.relu(logits_2)
layer_2_dropout = tf.nn.dropout(hidden_layer_2, keep_prob)

# the output logits
logits_3 = tf.matmul(layer_2_dropout, w3) + b3


# Normal Loss
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_3, labels=tf_train_labels))

learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 10000, 0.96)
# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

num_steps = 3001

with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
for step in range(num_steps):

// some logic to get training data batches

feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
_, l, predictions = session.run(
[optimizer, loss, train_prediction], feed_dict=feed_dict)

print("Minibatch loss at step %d: %f" % (step, l))

在打印 Loss 后,我看到它以指数方式增加到 NaN:

第 1 步的小批量损失:7474.770508
第 2 步的小批量损失:43229.820312
第 3 步的小批量损失:50132.988281
第 4 步的小批量损失:10196093.000000
第 5 步的小批量损失:3162884096.000000
第 6 步的小批量损失:25022026481664.000000
第 7 步的小批量损失:651425419900819079168.000000
第 8 步的小批量损失:21374465836947504345731163114962944.000000
第 9 步的小批量损失:nan
第 10 步的小批量损失:nan

我的代码与它几乎相似,但我得到的仍然是 NaN。

对我在这里可能做错了什么有什么建议吗?

最佳答案

这是因为 Relu 激活函数导致梯度爆炸。因此,您需要相应地降低学习率(在您的情况下为 starter_learning_rate)。此外,您还可以尝试不同的激活函数。

这里,( In simple multi-layer FFNN only ReLU activation function doesn't converge ) 是一个与您的情况类似的问题。跟着答案走,你就会明白。

希望这对您有所帮助。

关于python - Tensorflow - 损失增加到 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47245866/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com