gpt4 book ai didi

machine-learning - 为什么我的 CNN 过度拟合?如何修复?

转载 作者:行者123 更新时间:2023-11-30 09:15:39 26 4
gpt4 key购买 nike

我正在微调一个名为 C3D 的 3D-CNN,它最初是为了对视频剪辑中的体育项目进行分类而训练的。

我正在卡住卷积(特征提取)层,并使用 GIPHY 中的 gif 来训练完全连接的层,以对 gif 进行分类以进行情感分析(正面或负面)。

除了最终的全连接层之外,所有层的权重均已预先加载。

我使用 Keras 使用 5000 张图像(2500 张正图、2500 张负图)进行训练,训练/测试比例为 70/30。我使用的是 Adam 优化器,学习率为 0.0001。

在训练过程中,训练准确度会提高,训练损失会减少,但在很早的时候,随着模型开始过度拟合,验证准确度和损失不会提高。

我相信我有足够的训练数据,并且在两个完全连接的层上都使用 0.5 的 dropout,那么我该如何应对这种过度拟合呢?

下面可以找到 Keras 的模型架构、训练代码和训练性能可视化。

train_c3d.py

from training.c3d_model import create_c3d_sentiment_model
from ImageSentiment import load_gif_data
import numpy as np
import pathlib
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam


def image_generator(files, batch_size):
"""
Generate batches of images for training instead of loading all images into memory
:param files:
:param batch_size:
:return:
"""
while True:
# Select files (paths/indices) for the batch
batch_paths = np.random.choice(a=files,
size=batch_size)
batch_input = []
batch_output = []

# Read in each input, perform preprocessing and get labels
for input_path in batch_paths:
input = load_gif_data(input_path)
if "pos" in input_path: # if file name contains pos
output = np.array([1, 0]) # label
elif "neg" in input_path: # if file name contains neg
output = np.array([0, 1]) # label

batch_input += [input]
batch_output += [output]
# Return a tuple of (input,output) to feed the network
batch_x = np.array(batch_input)
batch_y = np.array(batch_output)

yield (batch_x, batch_y)


model = create_c3d_sentiment_model()
print(model.summary())
model.load_weights('models/C3D_Sport1M_weights_keras_2.2.4.h5', by_name=True)

for layer in model.layers[:14]: # freeze top layers as feature extractor
layer.trainable = False
for layer in model.layers[14:]: # fine tune final layers
layer.trainable = True

train_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_train').glob('**/*')]
val_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_validation').glob('**/*')]

batch_size = 8
train_generator = image_generator(train_files, batch_size)
validation_generator = image_generator(val_files, batch_size)

model.compile(optimizer=Adam(lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])

mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1)

history = model.fit_generator(train_generator, validation_data=validation_generator,
steps_per_epoch=int(np.ceil(len(train_files) / batch_size)),
validation_steps=int(np.ceil(len(val_files) / batch_size)), epochs=5, shuffle=True,
callbacks=[mc])

load_gif_data()

def load_gif_data(file_path):
"""
Load and process gif for input into Keras model
:param file_path:
:return: Mean normalised image in BGR format as numpy array
for more info see -> http://cs231n.github.io/neural-networks-2/
"""
im = Img(fp=file_path)
try:
im.load(limit=16, # Keras image model only requires 16 frames
first=True)
except:
print("Error loading image: " + file_path)
return
im.resize(size=(112, 112))
im.convert('RGB')
im.close()

np_frames = []
frame_index = 0
for i in range(16): # if image is less than 16 frames, repeat the frames until there are 16
frame = im.frames[frame_index]
rgb = np.array(frame)
bgr = rgb[..., ::-1]
mean = np.mean(bgr, axis=0)
np_frames.append(bgr - mean) # C3D model was originally trained on BGR, mean normalised images
# it is important that unseen images are in the same format
if frame_index == (len(im.frames) - 1):
frame_index = 0
else:
frame_index = frame_index + 1

return np.array(np_frames)

模型架构

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1 (Conv3D) (None, 16, 112, 112, 64) 5248
_________________________________________________________________
pool1 (MaxPooling3D) (None, 16, 56, 56, 64) 0
_________________________________________________________________
conv2 (Conv3D) (None, 16, 56, 56, 128) 221312
_________________________________________________________________
pool2 (MaxPooling3D) (None, 8, 28, 28, 128) 0
_________________________________________________________________
conv3a (Conv3D) (None, 8, 28, 28, 256) 884992
_________________________________________________________________
conv3b (Conv3D) (None, 8, 28, 28, 256) 1769728
_________________________________________________________________
pool3 (MaxPooling3D) (None, 4, 14, 14, 256) 0
_________________________________________________________________
conv4a (Conv3D) (None, 4, 14, 14, 512) 3539456
_________________________________________________________________
conv4b (Conv3D) (None, 4, 14, 14, 512) 7078400
_________________________________________________________________
pool4 (MaxPooling3D) (None, 2, 7, 7, 512) 0
_________________________________________________________________
conv5a (Conv3D) (None, 2, 7, 7, 512) 7078400
_________________________________________________________________
conv5b (Conv3D) (None, 2, 7, 7, 512) 7078400
_________________________________________________________________
zeropad5 (ZeroPadding3D) (None, 2, 8, 8, 512) 0
_________________________________________________________________
pool5 (MaxPooling3D) (None, 1, 4, 4, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
fc6 (Dense) (None, 4096) 33558528
_________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0
_________________________________________________________________
fc7 (Dense) (None, 4096) 16781312
_________________________________________________________________
dropout_2 (Dropout) (None, 4096) 0
_________________________________________________________________
nfc8 (Dense) (None, 2) 8194
=================================================================
Total params: 78,003,970
Trainable params: 78,003,970
Non-trainable params: 0
_________________________________________________________________
None

训练可视化 奥 git _a enter image description here

最佳答案

我认为错误出现在损失函数和最后一个 Dense 层中。正如模型摘要中所提供的,最后一个密集层是,

nfc8 (Dense) (None, 2)

输出形状为( None , 2 ),这意味着该层有 2 个单元。正如您之前所说,您需要将 GIF 分类为正面或负面。

Classifying GIFs could be a binary classification problem or a multiclass classification problem ( with two classes ).

二元分类在最后一个 Dense 层中只有 1 个单元,具有 sigmoid 激活函数。但是,这里模型的最后一个 Dense 层有 2 个单元。

因此,该模型是一个多类分类器,但您给出了一个 binary_crossentropy 损失函数,该函数适用于二元分类器(最后一层有一个单元)。

因此,用 categorical_crossentropy 替换损失应该可行。或者编辑最后一个 Dense 层并更改单元数量和激活函数。

希望这有帮助。

关于machine-learning - 为什么我的 CNN 过度拟合?如何修复?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56675919/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com