- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
如何将我的图像数据集随机拆分为训练和验证数据集?更具体地说,Keras ImageDataGenerator
函数中的 validation_split
参数不会将我的图像随机拆分为训练和验证,而是从未打乱的数据集中切分验证样本。
最佳答案
在 Keras 的 ImageDataGenerator
中指定 validation_split
参数时,拆分会在数据打乱之前执行,以便仅获取最后的 x 个样本。问题是最后一个被选作验证的数据样本可能不代表训练数据,因此它可能会失败。当您的图像数据存储在一个公共(public)目录中且每个子文件夹都按类命名时,这是一个特别常见的死胡同。已在多个帖子中指出:
Choose random validation data set
As you mentioned, Keras simply takes the last x samples of the dataset, so if you want to keep using it, you need to shuffle your dataset in advance.
The training accuracy is very high, while the validation accuracy is very low?
please check if you have shuffled the data before training. Because the validation splitting in keras is performed before shuffle, so maybe you have chosen an unbalanced dataset as your validation set, thus you got the low accuracy.
Does 'validation split' randomly choose validation sample?
The validation data is picked as the last 10% (for instance, if validation_split=0.9) of the input. The training data (the remainder) can optionally be shuffled at every epoch (shuffle argument in fit). That doesn't affect the validation data, obviously, it has to be the same set from epoch to epoch.
This answer指向 sklearn train_test_split()
作为解决方案,但我想提出一个不同的解决方案,以保持 keras 工作流程的一致性。
随着split-folders package 你可以随机将你的主要数据目录分成训练、验证和测试(或只是训练和验证)目录。类特定的子文件夹会自动复制。
输入文件夹应具有以下格式:
input/
class1/
img1.jpg
img2.jpg
...
class2/
imgWhatever.jpg
...
...
为了给你这个:
output/
train/
class1/
img1.jpg
...
class2/
imga.jpg
...
val/
class1/
img2.jpg
...
class2/
imgb.jpg
...
test/ # optional
class1/
img3.jpg
...
class2/
imgc.jpg
...
来自文档:
import split_folders
# Split with a ratio.
# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
split_folders.ratio('input_folder', output="output", seed=1337, ratio=(.8, .1, .1)) # default values
# Split val/test with a fixed number of items e.g. 100 for each set.
# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.
split_folders.fixed('input_folder', output="output", seed=1337, fixed=(100, 100), oversample=False) # default values
通过这种新的文件夹安排,您可以轻松地使用 keras 数据生成器将您的数据划分为训练和验证,并最终训练您的模型。
import tensorflow as tf
import split_folders
import os
main_dir = '/Volumes/WMEL/Independent Research Project/Data/test_train/Data'
output_dir = '/Volumes/WMEL/Independent Research Project/Data/test_train/output'
split_folders.ratio(main_dir, output=output_dir, seed=1337, ratio=(.7, .3))
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./224)
train_generator = train_datagen.flow_from_directory(os.path.join(output_dir,'train'),
class_mode='categorical',
batch_size=32,
target_size=(224,224),
shuffle=True)
validation_generator = train_datagen.flow_from_directory(os.path.join(output_dir,'val'),
target_size=(224, 224),
batch_size=32,
class_mode='categorical',
shuffle=True) # set as validation data
base_model = tf.keras.applications.ResNet50V2(
input_shape=IMG_SHAPE,
include_top=False,
weights=None)
maxpool_layer = tf.keras.layers.GlobalMaxPooling2D()
prediction_layer = tf.keras.layers.Dense(4, activation='softmax')
model = tf.keras.Sequential([
base_model,
maxpool_layer,
prediction_layer
])
opt = tf.keras.optimizers.Adam(lr=0.004)
model.compile(optimizer=opt,
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=['accuracy'])
model.fit(
train_generator,
steps_per_epoch = train_generator.samples // 32,
validation_data = validation_generator,
validation_steps = validation_generator.samples // 32,
epochs = 20)
关于python - 未从混洗数据集中选择 Keras ImageDataGenerator 验证拆分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62662194/
我是一名优秀的程序员,十分优秀!