- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我的简单问题是:下面详述的长短期内存网络是否经过适当设计,可以在给定舞蹈序列训练数据的情况下生成新的舞蹈序列?
背景:我正在与一位舞者合作,他希望使用神经网络生成新的舞蹈序列。她给我发了2016 chor-rnn paper使用最后带有混合密度网络层的 LSTM 网络完成了这项任务。然而,在向我的 LSTM 网络添加 MDN 层后,我的损失变为负数,并且结果看起来很困惑。这可能是由于训练数据非常小,但我想在扩大训练数据大小之前验证模型基础知识。如果有人可以建议下面的模型是否忽略了一些基本的东西(这很有可能),我将非常感谢他们的反馈。
我输入网络的样本数据(下面的 X
)具有形状 (626, 55, 3),它对应于 55 个 body 位置的 626 个时间快照,每个快照有 3 个坐标 ( x、y,然后 z)。所以X 1 [11][2]是第11个 body 部位在时间1时的z位置:
import requests
import numpy as np
# download the data
requests.get('https://s3.amazonaws.com/duhaime/blog/dancing-with-robots/dance.npy')
# X.shape = time_intervals, n_body_parts, 3
X = np.load('dance.npy')
为了确保正确提取数据,我将 X
的前几帧可视化:
import mpl_toolkits.mplot3d.axes3d as p3
import matplotlib.pyplot as plt
from IPython.display import HTML
from matplotlib import animation
import matplotlib
matplotlib.rcParams['animation.embed_limit'] = 2**128
def update_points(time, points, X):
arr = np.array([[ X[time][i][0], X[time][i][1] ] for i in range(int(X.shape[1]))])
points.set_offsets(arr) # set x, y values
points.set_3d_properties(X[time][:,2][:], zdir='z') # set z value
def get_plot(X, lim=2, frames=200, duration=45):
fig = plt.figure()
ax = p3.Axes3D(fig)
ax.set_xlim(-lim, lim)
ax.set_ylim(-lim, lim)
ax.set_zlim(-lim, lim)
points = ax.scatter(X[0][:,0][:], X[0][:,1][:], X[0][:,2][:], depthshade=False) # x,y,z vals
return animation.FuncAnimation(fig,
update_points,
frames,
interval=duration,
fargs=(points, X),
blit=False
).to_jshtml()
HTML(get_plot(X, frames=int(X.shape[0])))
这会产生一个像这样的小舞蹈序列:
到目前为止一切顺利。接下来,我将 x、y 和 z 维度的特征居中:
X -= np.amin(X, axis=(0, 1))
X /= np.amax(X, axis=(0, 1))
使用 HTML(get_plot(X,frames=int(X.shape[0])))
可视化生成的 X
,显示这些线很好地将数据居中。接下来,我使用 Keras 中的 Sequential API 构建模型本身:
from keras.models import Sequential, Model
from keras.layers import Dense, LSTM, Dropout, Activation
from keras.layers.advanced_activations import LeakyReLU
from keras.losses import mean_squared_error
from keras.optimizers import Adam
import keras, os
# config
look_back = 32 # number of previous time frames to use to predict the positions at time i
lstm_cells = 256 # number of cells in each LSTM "layer"
n_features = int(X.shape[1]) * int(X.shape[2]) # number of coordinate values to be predicted by each of `m` models
input_shape = (look_back, n_features) # shape of inputs
m = 32 # number of gaussian models to build
# set boolean controlling whether we use MDN or not
use_mdn = True
model = Sequential()
model.add(LSTM(lstm_cells, return_sequences=True, input_shape=input_shape))
model.add(LSTM(lstm_cells, return_sequences=True))
model.add(LSTM(lstm_cells))
if use_mdn:
model.add(MDN(n_features, m))
model.compile(loss=get_mixture_loss_func(n_features, m), optimizer=Adam(lr=0.000001))
else:
model.add(Dense(n_features, activation='tanh'))
model.compile(loss=mean_squared_error, optimizer='sgd')
model.summary()
模型构建完成后,我会将数据排列在 X
中,为训练做准备。在这里,我们希望通过检查前一个 look_back
时间片中每个 body 部位的位置来预测某个时间 55 个 body 部位的 x、y、z 位置:
# get training data in right shape
train_x = []
train_y = []
n_time, n_obs, n_attrs = [int(i) for i in X.shape]
for i in range(look_back, n_time-1, 1):
train_x.append( X[i-look_back:i].reshape(look_back, n_obs * n_attrs) )
train_y.append( X[i+1].ravel() )
train_x = np.array(train_x)
train_y = np.array(train_y)
最后我训练模型:
from livelossplot import PlotLossesKeras
# fit the model
model.fit(train_x, train_y, epochs=1024, batch_size=1, callbacks=[PlotLossesKeras()])
训练后,我可视化模型创建的新时间片:
# generate `n_frames` of new output time slices
n_frames = 3000
# seed the data to plot with the first `look_back` animation frames
data = X[0:look_back]
x0, x1, x2 = [int(i) for i in train_x.shape]
d0, d1, d2 = [int(i) for i in data.shape]
for i in range(look_back, n_frames, 1):
# get the model's prediction for the next position of points at time `i`
result = model.predict(train_x[i].reshape(1, x1, x2))
# if using the mixed density network, pull out vals that describe vertex positions
if use_mdn:
result = np.apply_along_axis(sample_from_output, 1, result, n_features, m, temp=1.0)
# reshape the result into the form of rows in `X`
result = result.reshape(1, d1, d2)
# push the result into the shape of `train_x` observations
stacked = np.vstack((data[i-look_back+1:i], result)).reshape(1, x1, x2)
# add the result to the `train_x` observations
train_x = np.vstack((train_x, stacked))
# add the result to the dataset for plotting
data = np.vstack((data[:i], result))
如果我将上面的 use_mdn
设置为 False
并使用简单的平方误差损失总和(L2 损失),那么生成的可视化效果看起来有点令人毛骨悚然,但仍然具有一般人的形状。
但是,如果我将 use_mdn
设置为 True
,并使用自定义 MDN 损失函数,结果会非常奇怪。我认识到 MDN 层添加了大量需要训练的参数,并且可能需要更多数量级的训练数据才能实现与 L2 损失函数输出一样人形的输出。
也就是说,我想问那些比我更广泛地使用神经网络模型的人是否认为上述方法有任何根本性的错误。关于这个问题的任何见解都会非常有帮助。
最佳答案
天哪,我开始了[ gist 】!这是 MDN 类:
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Sequential, Model
from keras.layers import Dense, Input, merge, concatenate, Dense, LSTM, CuDNNLSTM
from keras.engine.topology import Layer
from keras import backend as K
import tensorflow_probability as tfp
import tensorflow as tf
# check tfp version, as tfp causes cryptic error if out of date
assert float(tfp.__version__.split('.')[1]) >= 5
class MDN(Layer):
'''Mixture Density Network with unigaussian kernel'''
def __init__(self, n_mixes, output_dim, **kwargs):
self.n_mixes = n_mixes
self.output_dim = output_dim
with tf.name_scope('MDN'):
self.mdn_mus = Dense(self.n_mixes * self.output_dim, name='mdn_mus')
self.mdn_sigmas = Dense(self.n_mixes, activation=K.exp, name='mdn_sigmas')
self.mdn_alphas = Dense(self.n_mixes, activation=K.softmax, name='mdn_alphas')
super(MDN, self).__init__(**kwargs)
def build(self, input_shape):
self.mdn_mus.build(input_shape)
self.mdn_sigmas.build(input_shape)
self.mdn_alphas.build(input_shape)
self.trainable_weights = self.mdn_mus.trainable_weights + \
self.mdn_sigmas.trainable_weights + \
self.mdn_alphas.trainable_weights
self.non_trainable_weights = self.mdn_mus.non_trainable_weights + \
self.mdn_sigmas.non_trainable_weights + \
self.mdn_alphas.non_trainable_weights
self.built = True
def call(self, x, mask=None):
with tf.name_scope('MDN'):
mdn_out = concatenate([
self.mdn_mus(x),
self.mdn_sigmas(x),
self.mdn_alphas(x)
], name='mdn_outputs')
return mdn_out
def get_output_shape_for(self, input_shape):
return (input_shape[0], self.output_dim)
def get_config(self):
config = {
'output_dim': self.output_dim,
'n_mixes': self.n_mixes,
}
base_config = super(MDN, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def get_loss_func(self):
def unigaussian_loss(y_true, y_pred):
mix = tf.range(start = 0, limit = self.n_mixes)
out_mu, out_sigma, out_alphas = tf.split(y_pred, num_or_size_splits=[
self.n_mixes * self.output_dim,
self.n_mixes,
self.n_mixes,
], axis=-1, name='mdn_coef_split')
def loss_i(i):
batch_size = tf.shape(out_sigma)[0]
sigma_i = tf.slice(out_sigma, [0, i], [batch_size, 1], name='mdn_sigma_slice')
alpha_i = tf.slice(out_alphas, [0, i], [batch_size, 1], name='mdn_alpha_slice')
mu_i = tf.slice(out_mu, [0, i * self.output_dim], [batch_size, self.output_dim], name='mdn_mu_slice')
dist = tfp.distributions.Normal(loc=mu_i, scale=sigma_i)
loss = dist.prob(y_true) # find the pdf around each value in y_true
loss = alpha_i * loss
return loss
result = tf.map_fn(lambda m: loss_i(m), mix, dtype=tf.float32, name='mix_map_fn')
result = tf.reduce_sum(result, axis=0, keepdims=False)
result = -tf.log(result)
result = tf.reduce_mean(result)
return result
with tf.name_scope('MDNLayer'):
return unigaussian_loss
以及 LSTM 类:
class LSTM_MDN:
def __init__(self, n_verts=15, n_dims=3, n_mixes=2, look_back=1, cells=[32,32,32,32], use_mdn=True):
self.n_verts = n_verts
self.n_dims = n_dims
self.n_mixes = n_mixes
self.look_back = look_back
self.cells = cells
self.use_mdn = use_mdn
self.LSTM = CuDNNLSTM if len(gpus) > 0 else LSTM
self.model = self.build_model()
if use_mdn:
self.model.compile(loss=MDN(n_mixes, n_verts*n_dims).get_loss_func(), optimizer='adam', metrics=['accuracy'])
else:
self.model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
def build_model(self):
i = Input((self.look_back, self.n_verts*self.n_dims))
h = self.LSTM(self.cells[0], return_sequences=True)(i) # return sequences, stateful
h = self.LSTM(self.cells[1], return_sequences=True)(h)
h = self.LSTM(self.cells[2])(h)
h = Dense(self.cells[3])(h)
if self.use_mdn:
o = MDN(self.n_mixes, self.n_verts*self.n_dims)(h)
else:
o = Dense(self.n_verts*self.n_dims)(h)
return Model(inputs=[i], outputs=[o])
def prepare_inputs(self, X, look_back=2):
'''
Prepare inputs in shape expected by LSTM
@returns:
numpy.ndarray train_X: has shape: n_samples, lookback, verts * dims
numpy.ndarray train_Y: has shape: n_samples, verts * dims
'''
# prepare data for the LSTM_MDN
X = X.swapaxes(0, 1) # reshape to time, vert, dim
n_time, n_verts, n_dims = X.shape
# validate shape attributes
if n_verts != self.n_verts: raise Exception(' ! got', n_verts, 'vertices, expected', self.n_verts)
if n_dims != self.n_dims: raise Exception(' ! got', n_dims, 'dims, expected', self.n_dims)
if look_back != self.look_back: raise Exception(' ! got', look_back, 'for look_back, expected', self.look_back)
# lstm expects data in shape [samples_in_batch, timestamps, values]
train_X = []
train_Y = []
for i in range(look_back, n_time, 1):
train_X.append( X[i-look_back:i,:,:].reshape(look_back, n_verts * n_dims) ) # look_back, verts * dims
train_Y.append( X[i,:,:].reshape(n_verts * n_dims) ) # verts * dims
train_X = np.array(train_X) # n_samples, lookback, verts * dims
train_Y = np.array(train_Y) # n_samples, verts * dims
return [train_X, train_Y]
def predict_positions(self, input_X):
'''
Predict the output for a series of input frames. Each prediction has shape (1, y), where y contains:
mus = y[:n_mixes*n_verts*n_dims]
sigs = y[n_mixes*n_verts*n_dims:-n_mixes]
alphas = softmax(y[-n_mixes:])
@param numpy.ndarray input_X: has shape: n_samples, look_back, n_verts * n_dims
@returns:
numpy.ndarray X: has shape: verts, time, dims
'''
predictions = []
for i in range(input_X.shape[0]):
y = self.model.predict( train_X[i:i+1] ).squeeze()
mus = y[:n_mixes*n_verts*n_dims]
sigs = y[n_mixes*n_verts*n_dims:-n_mixes]
alphas = self.softmax(y[-n_mixes:])
# find the most likely distribution then pull out the mus that correspond to that selected index
alpha_idx = np.argmax(alphas) # 0
alpha_idx = 0
predictions.append( mus[alpha_idx*self.n_verts*self.n_dims:(alpha_idx+1)*self.n_verts*self.n_dims] )
predictions = np.array(predictions).reshape(train_X.shape[0], self.n_verts, self.n_dims).swapaxes(0, 1)
return predictions # shape = n_verts, n_time, n_dims
def softmax(self, x):
''''Compute softmax values for vector `x`'''
r = np.exp(x - np.max(x))
return r / r.sum()
然后设置类:
X = data.selected.X
n_verts, n_time, n_dims = X.shape
n_mixes = 3
look_back = 2
lstm_mdn = LSTM_MDN(n_verts=n_verts, n_dims=n_dims, n_mixes=n_mixes, look_back=look_back)
train_X, train_Y = lstm_mdn.prepare_inputs(X, look_back=look_back)
上面链接的要点包含完整的血淋淋的细节,以防有人想要重现它并将其拆开以更好地理解其机制......
关于python - Keras:向 LSTM 网络添加 MDN 层,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52856155/
我有兴趣在 tf.keras 中训练一个模型,然后用 keras 加载它。我知道这不是高度建议,但我对使用 tf.keras 来训练模型很感兴趣,因为 tf.keras 更容易构建输入管道 我想利用
我进行了大量搜索,但仍然无法弄清楚如何编写具有多个交互输出的自定义损失函数。 我有一个神经网络定义为: def NeuralNetwork(): inLayer = Input((2,));
我正在阅读一篇名为 Differential Learning Rates 的文章在 Medium 上,想知道这是否可以应用于 Keras。我能够找到在 pytorch 中实现的这项技术。这可以在 K
我正在实现一个神经网络分类器,以打印我正在使用的这个神经网络的损失和准确性: score = model.evaluate(x_test, y_test, verbose=False) model.m
我最近在查看模型摘要时遇到了这个问题。 我想知道,[(None, 16)] 和有什么区别?和 (None, 16) ?为什么输入层有这样的输入形状? 来源:model.summary() can't
我正在尝试使用 Keras 创建自定义损失函数。我想根据输入计算损失函数并预测神经网络的输出。 我尝试在 Keras 中使用 customloss 函数。我认为 y_true 是我们为训练提供的输出,
我有一组样本,每个样本都是一组属性的序列(例如,一个样本可以包含 10 个序列,每个序列具有 5 个属性)。属性的数量总是固定的,但序列的数量(时间戳)可能因样本而异。我想使用这个样本集在 Keras
Keras 在训练集和测试集文件夹中发现了错误数量的类。我有 3 节课,但它一直说有 4 节课。有人可以帮我吗? 这里的代码: cnn = Sequential() cnn.add(Conv2D(32
我想编写一个自定义层,在其中我可以在两次运行之间将变量保存在内存中。例如, class MyLayer(Layer): def __init__(self, out_dim = 51, **kwarg
我添加了一个回调来降低学习速度: keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=100,
在 https://keras.io/layers/recurrent/我看到 LSTM 层有一个 kernel和一个 recurrent_kernel .它们的含义是什么?根据我的理解,我们需要 L
问题与标题相同。 我不想打开 Python,而是使用 MacOS 或 Ubuntu。 最佳答案 Python 库作者将版本号放入 .__version__ 。您可以通过在命令行上运行以下命令来打印它:
Keras 文档并不清楚这实际上是什么。我知道我们可以用它来将输入特征空间压缩成更小的空间。但从神经设计的角度来看,这是如何完成的呢?它是一个自动编码器,RBM吗? 最佳答案 据我所知,嵌入层是一个简
我想实现[http://ydwen.github.io/papers/WenECCV16.pdf]中解释的中心损失]在喀拉斯 我开始创建一个具有 2 个输出的网络,例如: inputs = Input
我正在尝试实现多对一模型,其中输入是大小为 的词向量d .我需要输出一个大小为 的向量d 在 LSTM 结束时。 在此 question ,提到使用(对于多对一模型) model = Sequenti
我有不平衡的训练数据集,这就是我构建自定义加权分类交叉熵损失函数的原因。但问题是我的验证集是平衡的,我想使用常规的分类交叉熵损失。那么我可以在 Keras 中为验证集传递不同的损失函数吗?我的意思是用
DL 中的一项常见任务是将输入样本归一化为零均值和单位方差。可以使用如下代码“手动”执行规范化: mean = np.mean(X, axis = 0) std = np.std(X, axis =
我正在尝试学习 Keras 并使用 LSTM 解决分类问题。我希望能够绘制 准确率和损失,并在训练期间更新图。为此,我正在使用 callback function . 由于某种原因,我在回调中收到的准
在 Keras 内置函数中嵌入使用哪种算法?Word2vec?手套?其他? https://keras.io/layers/embeddings/ 最佳答案 简短的回答是都不是。本质上,GloVe 的
我有一个使用 Keras 完全实现的 LSTM RNN,我想使用梯度剪裁,梯度范数限制为 5(我正在尝试复制一篇研究论文)。在实现神经网络方面,我是一个初学者,我将如何实现? 是否只是(我正在使用 r
我是一名优秀的程序员,十分优秀!