gpt4 book ai didi

python - Keras 训练在多次正确执行后崩溃

转载 作者:行者123 更新时间:2023-11-30 09:17:13 26 4
gpt4 key购买 nike

我正在尝试创建一个基于 Cudgru 的模型,该模型可以预测 7 个相互关联的特征的序列。这是我的 keras 模型摘要:

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
cu_dnngru_1 (CuDNNGRU) (None, 49, 100) 32700
_________________________________________________________________
dropout_1 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_2 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_2 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_3 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_3 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_4 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_4 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_5 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_5 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_6 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_6 (Dropout) (None, 49, 100) 0
_________________________________________________________________
cu_dnngru_7 (CuDNNGRU) (None, 49, 100) 60600
_________________________________________________________________
dropout_7 (Dropout) (None, 49, 100) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4900) 0
_________________________________________________________________
dense_1 (Dense) (None, 7) 34307
=================================================================
Total params: 430,607
Trainable params: 430,607
Non-trainable params: 0

我正在尝试运行此模型以获得更多的纪元数。前几个纪元都很好,但随后就会出错:

Model] Model Compiled
Time taken: 0:00:02.314468
[Model] Training Started
[Model] 100 epochs, 1000 batch size, 20.0 batches per epoch
Epoch 1/100
20/20 [==============================] - 5s 240ms/step - loss: 0.1631 - acc: 0.2905
Epoch 2/100
20/20 [==============================] - 2s 81ms/step - loss: 0.1288 - acc: 0.2455
Epoch 3/100
20/20 [==============================] - 1s 73ms/step - loss: 0.0952 - acc: 0.5058
Epoch 4/100
20/20 [==============================] - 2s 76ms/step - loss: 0.1141 - acc: 0.3288
Epoch 5/100
20/20 [==============================] - 2s 75ms/step - loss: 0.1064 - acc: 0.3425
Epoch 6/100
20/20 [==============================] - 1s 75ms/step - loss: 0.0767 - acc: 0.4213
Epoch 7/100
20/20 [==============================] - 1s 75ms/step - loss: 0.0635 - acc: 0.4764
Epoch 8/100
20/20 [==============================] - 1s 74ms/step - loss: 0.0555 - acc: 0.5274
Epoch 9/100
20/20 [==============================] - 1s 74ms/step - loss: 0.0544 - acc: 0.5141
Epoch 10/100
...
Epoch 61/100
20/20 [==============================] - 1s 74ms/step - loss: 0.0506 - acc: 0.3925
Epoch 62/100
20/20 [==============================] - 1s 72ms/step - loss: 0.0495 - acc: 0.4323
Epoch 63/100
20/20 [==============================] - 1s 73ms/step - loss: 0.0495 - acc: 0.4118
Epoch 64/100
2/20 [==>...........................] - ETA: 1s - loss: 0.0495 - acc: 0.4885Traceback (most recent call last):
File "./run.py", line 118, in <module>
main()
File "./run.py", line 92, in main
steps_per_epoch=steps_per_epoch)
File "/home/sridhar/PE_CSV/alarmProj/rnn/lstm/core/model.py", line 149, in train_generator
workers=70)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training.py", line 1415, in fit_generator
initial_epoch=initial_epoch)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training_generator.py", line 213, in fit_generator
class_weight=class_weight)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training.py", line 1209, in train_on_batch
class_weight=class_weight)
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training.py", line 749, in _standardize_user_data
exception_prefix='input')
File "/home/sridhar/PE_CSV/malenv/local/lib/python2.7/site-packages/keras/engine/training_utils.py", line 127, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected cu_dnngru_1_input to have 3 dimensions, but got array with shape (380, 1)

如果我将纪元数减少到小于该值(这里说纪元 64),我不会遇到任何问题,但增加纪元数会在某些时候导致上述错误。崩溃的确切次数似乎会随着配置的变化而变化。普通 GRU/LSTM 层也存在同样的问题。

这是 keras-2.2.2,该模型正在使用 70 个工作线程进行编译。

我可以做些什么来避免这个问题吗?

编辑:这是使用的相关近似代码:

session_conf = tf.ConfigProto(
inter_op_parallelism_threads=multiprocessing.cpu_count(),
intra_op_parallelism_threads=multiprocessing.cpu_count())
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

self.model.add(CuDNNGRU(
100,
input_shape=(49,7),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))
self.model.add(CuDNNGRU(
100,
input_shape=(None,None),
kernel_initializer='orthogonal',
return_sequences=true))
self.model.add(Dropout(0.4))

elf.model.add(Flatten())
self.model.add(Dense(7, activation='relu'))

sgd = SGD(lr=0.1, decay=1e-2, clipnorm=5.0)

self.model.compile(
loss='mse',
metrics=["accuracy"],
optimizer=sgd)
===================

def train_generator(self, data_gen, epochs, batch_size, steps_per_epoch):
timer = Timer()
timer.start()
print('[Model] Training Started')
print('[Model] %s epochs, %s batch size, %s batches per epoch' %
(epochs, batch_size, steps_per_epoch))

save_fname = '%s/%s-e%s.h5' % (self.model_dir, dt.datetime.now()
.strftime('%d%m%Y-%H%M%S'), str(epochs))
callbacks = [
ModelCheckpoint(
filepath=save_fname, monitor='loss', save_best_only=True)
]
try:
self.model.fit_generator(
data_gen,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
callbacks=callbacks)
except:
pdb.set_trace()
)

print('[Model] Training Completed. Model saved as %s' % save_fname)
timer.stop()
=============
#invoked from main function
model.train_generator(
data_gen=data.generate_train_batch(
seq_len=50,
batch_size=1000,
normalise=false),
epochs=100,
batch_size=1000,
steps_per_epoch=steps_per_epoch)
=============

def generate_train_batch(self, seq_len, batch_size, normalise):
'''Yield a generator of training data from filename on given list of cols split for train/test'''
i = 0
while i < (self.len_train - seq_len):
x_batch = []
y_batch = []
for b in range(batch_size):
if i >= (self.len_train - seq_len):
# stop-condition for a smaller final batch if data doesn't divide evenly

yield np.array(x_batch), np.array(y_batch)
x, y = self._next_window(i, seq_len, normalise)
x_batch.append(x)
y_batch.append(y)
i += 1

yield np.array(x_batch), np.array(y_batch)
=======================

最佳答案

生成器错误。它错误地假设生成器是有限的,而 keras 则期望生成器是无限的。

关于python - Keras 训练在多次正确执行后崩溃,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52543817/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com