gpt4 book ai didi

python - Keras:如何扩展 validation_split 以生成第三组,即测试集?

转载 作者:太空宇宙 更新时间:2023-11-04 04:34:54 27 4
gpt4 key购买 nike

我正在使用带有 TensorFlow 后端的 Keras。我使用带有 validation_split 参数的 ImageDataGenerator 将我的数据拆分为训练集和验证集。因此,我使用 flow_from_directory 并将子集设置为“训练”和“测试”,如下所示:

total_gen = ImageDataGenerator(validation_split=0.3)


train_gen = data_generator.flow_from_directory(my_dir, target_size=(input_size, input_size), shuffle=False, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="training")

valid_gen = data_generator.flow_from_directory(my_dir, target_size=(input_size, input_size), shuffle=False, seed=13,
class_mode='categorical', batch_size=32, subset="validation")

这非常方便,因为它允许我只使用一个目录而不是两个(一个用于训练,一个用于验证)。现在我想知道是否可以扩展此过程以生成第三组,即测试集?

最佳答案

开箱即用是不可能的。您应该能够通过对 source code 进行一些小的修改来做到这一点。 ImageDataGenerator:

if subset is not None:
if subset not in {'training', 'validation'}: # add a third subset here
raise ValueError('Invalid subset name:', subset,
'; expected "training" or "validation".') # adjust message
split_idx = int(len(x) * image_data_generator._validation_split)
# you'll need two split indices here
if subset == 'validation':
x = x[:split_idx]
x_misc = [np.asarray(xx[:split_idx]) for xx in x_misc]
if y is not None:
y = y[:split_idx]
elif subset == '...' # add extra case here

else:
x = x[split_idx:]
x_misc = [np.asarray(xx[split_idx:]) for xx in x_misc] # change slicing
if y is not None:
y = y[split_idx:] # change slicing

编辑:这是您可以修改代码的方式:

if subset is not None:
if subset not in {'training', 'validation', 'test'}:
raise ValueError('Invalid subset name:', subset,
'; expected "training" or "validation" or "test".')
split_idxs = (int(len(x) * v) for v in image_data_generator._validation_split)
if subset == 'validation':
x = x[:split_idxs[0]]
x_misc = [np.asarray(xx[:split_idxs[0]]) for xx in x_misc]
if y is not None:
y = y[:split_idxs[0]]
elif subset == 'test':
x = x[split_idxs[0]:split_idxs[1]]
x_misc = [np.asarray(xx[split_idxs[0]:split_idxs[1]]) for xx in x_misc]
if y is not None:
y = y[split_idxs[0]:split_idxs[1]]
else:
x = x[split_idxs[1]:]
x_misc = [np.asarray(xx[split_idxs[1]:]) for xx in x_misc]
if y is not None:
y = y[split_idxs[1]:]

基本上,validation_split 现在应该是两个 float 的元组,而不是单个 float 。验证数据将是 0 和 validation_split[0] 之间的数据部分,validation_split[0] 和 validation_split[1] 之间的测试数据以及 之间的训练数据validation_split[1] 和 1。您可以这样使用它:

import keras
# keras_custom_preprocessing is how i named my directory
from keras_custom_preprocessing.image import ImageDataGenerator

generator = ImageDataGenerator(validation_split=(0.1, 0.5))
# First 10%: validation data - next 40% test data - rest: training data
gen = generator.flow_from_directory(directory='./data/', subset='test')
# Finds 40% of the images in the dir

您将需要在另外两行或三行中修改文件(您必须更改类型检查),但仅此而已,应该可以。我有修改后的文件,如果您有兴趣,请告诉我,我可以将其托管在我的 github 上。

关于python - Keras:如何扩展 validation_split 以生成第三组,即测试集?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51952231/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com