- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我目前正在使用 TensorFlow 的 Python API 开发一个音频分类器,使用 UrbanSound8K 数据集,从每个文件中收集 176400 个数据点,并尝试区分 10 个互斥的类。
我已将此示例代码改编为卷积神经网络: https://www.tensorflow.org/get_started/mnist/pros
不幸的是,我收到以下错误:
Traceback (most recent call last):
...
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [7000,10] and labels shape [10]
[[Node: xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"](read/add, _recv_y_0/_9)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "urban-cnn.py", line 124, in <module>
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: .5})
...
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [7000,10] and labels shape [10]
[[Node: xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"](read/add, _recv_y_0/_9)]]
Caused by op 'xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits', defined at:
File "urban-cnn.py", line 102, in <module>
xent = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=y_conv), name="xent")
...
InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [7000,10] and labels shape [10]
[[Node: xent/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"](read/add, _recv_y_0/_9)]]
这是代码的稍微编辑版本:
import tensorflow as tf
import soundfile as sfx
import numpy as np
import math
import glob
batch_size = 10
n_epochs = 10
input_width = 176400
n_labels = 10
widths = [5, 5, 7]
channels = [1, 8, 64, 512, n_labels]
learning_rate = 1e-4
def load_data():
data_x = []
data_y = []
for path in glob.glob("./UrbanSound8K/audio/fold1/*.wav"):
name = path.split("/")[-1].split(".")[0]
x, sample_rate = sfx.read(path, frames=input_width, fill_value=0.)
y = int(name.split("-")[1])
if x.ndim > 1:
x = x.take(0, axis=1)
data_x.append(x)
data_y.append(y)
return data_x, data_y
data_x, data_y = load_data()
data_split = int(len(data_x) * .9)
train_x = data_x[:data_split]
train_y = data_y[:data_split]
test_x = data_x[data_split:]
test_y = data_y[data_split:]
x = tf.placeholder(tf.float32, [None, input_width], name="x")
y = tf.placeholder(tf.int64, [None], name="y")
x_reshaped = tf.reshape(x, [-1, 1, input_width, channels[0]], name="x_reshaped")
def weights_x(shape, name):
w = tf.Variable(tf.truncated_normal(shape, stddev=0.1), name=name)
tf.summary.histogram("weights", w)
return w
def weights(layer, name):
return weights_x([1, widths[layer], channels[layer], channels[layer+1]], name)
def biases(layer, name):
b = tf.Variable(tf.constant(0.1, shape=[channels[layer+1]]), name=name)
tf.summary.histogram("biases", b)
return b
def convolution(p, w, b, name):
c = tf.nn.relu(tf.nn.conv2d(p, w, strides=[1, 1, 1, 1], padding="SAME") + b, name=name)
tf.summary.histogram("convolution", c)
return c
def pooling(c, name):
p = tf.nn.max_pool(c, ksize=[1, 1, 6, 1], strides=[1, 1, 6, 1], padding="SAME", name=name)
tf.summary.histogram("pooling", p)
return p
with tf.name_scope("conv1"):
w1 = weights(0, "w1")
b1 = biases(0, "b1")
c1 = convolution(x_reshaped, w1, b1, "c1")
p1 = pooling(c1, "p1")
with tf.name_scope("conv2"):
w2 = weights(1, "w2")
b2 = biases(1, "b2")
c2 = convolution(p1, w2, b2, "c2")
p2 = pooling(c2, "p2")
with tf.name_scope("dens"):
n_edges = widths[2] * channels[2]
wf1 = weights_x([n_edges, channels[3]], "wf1")
bf1 = biases(2, "bf1")
pf1 = tf.reshape(p2, [-1, n_edges], name="pf1")
f1 = tf.nn.relu(tf.matmul(pf1, wf1) + bf1, name="f1")
with tf.name_scope("drop"):
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
dropout = tf.nn.dropout(f1, keep_prob)
with tf.name_scope("read"):
wf2 = weights_x([channels[3], channels[4]], "wf2")
bf2 = biases(3, "bf2")
y_conv = tf.matmul(dropout, wf2) + bf2
with tf.name_scope("xent"):
xent = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=y_conv), name="xent")
tf.summary.scalar("xent", xent)
with tf.name_scope("optimizer"):
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(xent)
with tf.name_scope("accuracy"):
correct_prediction = tf.equal(tf.argmax(y_conv, 1), y, name="correct_prediction")
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
tf.summary.scalar("accuracy", accuracy)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print("Initialized Global Variables")
for epoch in range(n_epochs):
n_itr = len(train_x)//batch_size
for itr in range(n_itr):
left, right = itr*batch_size, (itr+1)*batch_size
batch_x, batch_y = train_x[left:right], train_y[left:right]
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: .5})
print("epoch: ", epoch + 1)
print("accuracy: ", sess.run(accuracy, feed_dict={x: test_x, y: test_y, keep_prob: 1.}))
在调用 sess.run(...) 之前检查张量形状时,一切都符合预期。
那么为什么 logits 的形状是 [7000, n_labels] 而不是 [batch_size, n_labels]?
最佳答案
你的网络结构不正确,关键问题在这里
with tf.name_scope("dens"):
n_edges = widths[2] * channels[2]
wf1 = weights_x([n_edges, channels[3]], "wf1")
bf1 = biases(2, "bf1")
pf1 = tf.reshape(p2, [-1, n_edges], name="pf1")
f1 = tf.nn.relu(tf.matmul(pf1, wf1) + bf1, name="f1")
p2 的形状为 [10, 1, 4900, 64],n_edges 不等于 4900 * 64 = 313600,而是 448(层太小!),如果你使 n_edges = 313600 一切都很好,然而,这是否是您想要的架构取决于您。看起来你合并了两个不兼容的东西,你使用卷积核的形状来计算层有多大来压平它。然而,这不是卷积的工作原理——层的形状取决于输入、内核和填充的大小。因此,一般来说它要大得多,正如本例所示 - 全连接层实际上应该有超过 300k 个输入神经元,而不是像你的代码中那样 - 只有 448 个。这里的关键区别在于,这个全连接层连接层作用于卷积的输出,而不是参数。
这个7000只是batch_size * (4900 * 64)/(n_edges) = 10 * 313600/448 = 7000(pf1 reshape )操作的结果。
更通用的修复方法是
p2s = p2.get_shape()
n_edges = int(p2s[1] * p2s[2] * p2s[3])
由于此时 p2 的所有形状(除了第 0 个)都是已知的,因此可以读取并用于构建网络的提醒。
关于machine-learning - 是什么赋予了 Logits 这种意想不到的形状?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42329874/
我是一名优秀的程序员,十分优秀!