gpt4 book ai didi

python - keras model.predict_generator() 未返回正确的实例数

转载 作者:行者123 更新时间:2023-12-01 09:04:02 24 4
gpt4 key购买 nike

我已按照以下链接学习如何使用keras模型的generatorfit_generatorhttps://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly我遇到的一个问题是,当我在某些测试数据生成器上调用 model.predict_generator() 时,返回值的长度与我在生成器中发送的值不同。我的测试数据长度为229431,我使用的batch_size为256,当我按以下方式在generator类中定义__len__函数时:

class DataGenerator(keras.utils.Sequence):
"""A simple generator"""

def __init__(self, list_IDs, labels, dim, dim_label, batch_size=512, shuffle=True, is_training=True):
"""Initialization"""
self.list_IDs = list_IDs
self.labels = labels
self.dim = dim
self.dim_label = dim_label
self.batch_size = batch_size
self.shuffle = shuffle
self.is_training = is_training
self.on_epoch_end()

def __len__(self):
"""Denotes the number of batches per epoch"""
return int(np.ceil(len(self.list_IDs) / self.batch_size))

def __getitem__(self, index):
"""Generate one batch of data"""
# Generate indexes of the batch
indexes = self.indexes[index * self.batch_size: (index + 1) * self.batch_size]

# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
list_labels_temp = [self.labels[k] for k in indexes]

# Generate data
result = self.__data_generation(list_IDs_temp, list_labels_temp, self.is_training)
if self.is_training:
X, y = result
return X, y
else:
# only return X when test
X = result
return X

def on_epoch_end(self):
"""Updates indexes after each epoch"""
self.indexes = np.arange(len(self.list_IDs))
if self.shuffle:
np.random.shuffle(self.indexes)

def __data_generation(self, list_IDs_temp, list_labels_temp, is_training):
"""Generates data containing batch_size samples"""
# Initialization
# X is a list of np.array
X = np.empty((self.batch_size, *self.dim))
if is_training:
# y could have multiple columns
y = np.empty((self.batch_size, *self.dim_label), dtype=int)

# Generate data
for i, (ID, label) in enumerate(zip(list_IDs_temp, list_labels_temp)):
# Store sample
X[i,] = np.load(ID)
if is_training:
# Store class
y[i,] = np.load(label)
if is_training:
return X, y
else:
return X

我的预测值的返回长度是229632。这是预测的代码:

test_generator = DataGenerator(partition, labels, is_training=False, **self.params)
predict_raw = self.model.predict_generator(generator=test_generator, workers=12, verbose=2)

当我修改 DataGenerator__len__ 方法以 return int(np. ceil(len(self.list_IDs)/self.batch_size)),我得到229376个预测值,229376/256 = 896,这是正确的长度数。但我传递给生成器的是 229431 个样本。

我认为在__getitem__方法中,当运行最后一批时,它应该只获取少于256个样本来自动测试。但事实似乎并非如此,那么如何确保模型预测正确数量的样本呢?

最佳答案

对于最后一批,方法 __getitem__ 中计算的索引大小不正确。为了预测正确的样本数量,索引应定义如下(参见 post ):

def __getitem__(self, index):
"""Generate one batch of data"""
idx_min = idx*self.batch_size
idx_max = min(idx_min + self.batch_size, len(self.list_IDs))
indexes = self.indexes[idx_min: idx_max]

...

关于python - keras model.predict_generator() 未返回正确的实例数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52219925/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com