gpt4 book ai didi

tensorflow - 损失函数减小,但训练集的精度在 tensorflow 中没有变化

转载 作者:行者123 更新时间:2023-12-02 12:39:00 25 4
gpt4 key购买 nike

我正在尝试使用 tensorflow 的深度卷积神经网络来实现一个简单的性别分类器。我找到了这个model并实现了它。

def create_model_v2(data):

cl1_desc = {'weights':weight_variable([7,7,3,96]), 'biases':bias_variable([96])}
cl2_desc = {'weights':weight_variable([5,5,96,256]), 'biases':bias_variable([256])}
cl3_desc = {'weights':weight_variable([3,3,256,384]), 'biases':bias_variable([384])}

fc1_desc = {'weights':weight_variable([240000, 128]), 'biases':bias_variable([128])}
fc2_desc = {'weights':weight_variable([128,128]), 'biases':bias_variable([128])}
fc3_desc = {'weights':weight_variable([128,2]), 'biases':bias_variable([2])}

cl1 = conv2d(data,cl1_desc['weights'] + cl1_desc['biases'])
cl1 = tf.nn.relu(cl1)
pl1 = max_pool_nxn(cl1,3,[1,2,2,1])
lrm1 = tf.nn.local_response_normalization(pl1)

cl2 = conv2d(lrm1, cl2_desc['weights'] + cl2_desc['biases'])
cl2 = tf.nn.relu(cl2)
pl2 = max_pool_nxn(cl2,3,[1,2,2,1])
lrm2 = tf.nn.local_response_normalization(pl2)

cl3 = conv2d(lrm2, cl3_desc['weights'] + cl3_desc['biases'])
cl3 = tf.nn.relu(cl3)
pl3 = max_pool_nxn(cl3,3,[1,2,2,1])

fl = tf.contrib.layers.flatten(cl3)

fc1 = tf.add(tf.matmul(fl, fc1_desc['weights']), fc1_desc['biases'])
drp1 = tf.nn.dropout(fc1,0.5)
fc2 = tf.add(tf.matmul(drp1, fc2_desc['weights']), fc2_desc['biases'])
drp2 = tf.nn.dropout(fc2,0.5)
fc3 = tf.add(tf.matmul(drp2, fc3_desc['weights']), fc3_desc['biases'])

return fc3

此时我需要注意的是,我还完成了本文中描述的所有预处理步骤,但是我的图像大小调整为 100x100x3,而不是 277x277x3。

我将女性的 logits 定义为 [0,1],将男性的 logits 定义为 [1,0]

x = tf.placeholder('float',[None,100,100,3])
y = tf.placeholder('float',[None,2])

并定义了训练程序如下:

def train(x, hm_epochs, LR):
#prediction = create_model_v2(x)
prediction = create_model_v2(x)
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits = prediction, labels = y) )
optimizer = tf.train.AdamOptimizer(learning_rate=LR).minimize(cost)
batch_size = 50
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print("hello")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

for epoch in range(hm_epochs):
epoch_loss = 0
i = 0
while i < (len(x_train)):
start = i
end = i + batch_size
batch_x = x_train[start:end]
batch_y = y_train[start:end]
whatever, vigen = sess.run([optimizer, cost], feed_dict = {x:batch_x, y:batch_y})
epoch_loss += vigen
i+=batch_size

print('Epoch', epoch ,'loss:',epoch_loss/len(x_train))
if (epoch+1) % 2 == 0:
j = 0
acc = []
while j < len(x_test):
acc += [accuracy.eval(feed_dict = {x:x_test[j:j + 10], y:y_test[j:j+10]})]
j+= 10
print ('accuracy after', epoch + 1, 'epochs on test set: ', sum(acc)/len(acc))

j = 0
acc = []
while j < len(x_train):
acc += [accuracy.eval(feed_dict = {x:x_train[j:j + 10], y:y_train[j:j+10]})]
j+= 10
print ('accuracy after', epoch, ' epochs on train set:', sum(acc)/len(acc))

上面的一半代码仅用于每 2 个 epoch 输出测试和训练精度。

无论如何,损失在第一个时期开始就很高

('Epoch', 0, 'loss:', 148.87030902462453)

('Epoch', 1, 'loss:', 0.01549744715988636)

('accuracy after', 2, 'epochs on test set: ', 0.33052011888510396)

('accuracy after', 1, ' epochs on train set:', 0.49607501227222384)

('Epoch', 2, 'loss:', 0.015493246909976005)

我错过了什么?

并继续这样将训练集的精度保持在 0.5。

编辑:函数权重变量 conv2d 和 max_pool_nn 为

def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)

def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)

def avg_pool_nxn(x, n, strides):
return tf.nn.avg_pool(x, ksize=[1,n,n,1], strides = strides,padding = 'SAME')

def max_pool_nxn(x, n, strides):
return tf.nn.max_pool(x, ksize=[1,n,n,1], strides = strides, padding = 'SAME')

def conv2d(x, W,stride = [1,1,1,1]):
return tf.nn.conv2d(x, W, strides = stride, padding = 'SAME')

编辑 2 - 问题已解决

该问题与参数初始化密切相关。将权重初始化从正态分布更改为 Xavier 初始化产生了奇迹,准确率最终达到约 86%。如果有人感兴趣,这里是原始论文 http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf ,如果有人知道并愿意准确解释为什么 Xavier 可以很好地处理卷积网络和图像,请随时发布答案。

最佳答案

权重的正确初始化通常对于训练更深入的神经网络至关重要。

Xavier 初始化的目的是确保每个神经元的输出方差预计为 1.0(参见 here )。这通常依赖于额外的假设,即您的输入已标准化为均值 0 和方差 1,因此确保这一点也很重要。

对于ReLU单位,我相信He initialisation实际上被认为是最佳实践。这需要从具有标准差的零均值高斯分布进行初始化:

heinitformula

其中n是输入单元的数量。请参阅Lasagne docs了解一些其他激活函数的最佳实践。

顺便说一句,批量归一化通常可以减少模型性能对权重初始化的依赖。

关于tensorflow - 损失函数减小,但训练集的精度在 tensorflow 中没有变化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45521025/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com