gpt4 book ai didi

Disagreement in confusion matrix and accuracy when using data generator(使用数据生成器时混淆矩阵中的不一致性和准确性)

转载 作者:bug小助手 更新时间:2023-10-25 23:11:47 28 4
gpt4 key购买 nike



I was working on a model
based on the following code

我正在开发一个基于以下代码的模型


epoch=100
model_history = model.fit(train_generator,
epochs=epoch,
validation_data=test_generator,
callbacks=[model_es, model_rlr, model_mcp])

After model training when I evaluated the model using the following code, I get an accuracy of 98.3%

经过模型训练,当我使用以下代码评估模型时,我得到了98.3%的准确率


model.evaluate(test_generator)

41/41 [==============================] - 3s 68ms/step - loss: 0.0396 - accuracy: 0.9893
[0.039571091532707214, 0.9893211126327515]

41/41[=


In order to analyse the result, I tried to obtain a confusion matrix of the test_generator using the following code

为了分析结果,我尝试使用以下代码获得测试生成器的混淆矩阵


y_pred = model.predict(test_generator)
y_pred = np.argmax(y_pred, axis=1)
print(confusion_matrix(test_generator.classes, y_pred))

However the output is

但是,输出是


[[ 68  66  93  73]
[ 64 65 93 84]
[ 91 102 126 86]
[ 69 75 96 60]]

which highly disagrees with the model_evaluate

这与模型评估非常不一致。


Can anyone help me out here to obtain the actual confusion matrix for the model

有人能帮我得到模型的实际混淆矩阵吗?


plot history of model accuracy

绘制模型精度的历史记录


Entire code: https://colab.research.google.com/drive/1wpoPjnSoCqVaA--N04dcUG6A5NEVcufk?usp=sharing

完整代码:https://colab.research.google.com/drive/1wpoPjnSoCqVaA--N04dcUG6A5NEVcufk?usp=sharing


更多回答

Can you update your question with how you defined your dataset? Usually a common cause of predictions not matching training/validation results is from shuffling the test set. The flow* functions shuffle by default.

您能用您定义数据集的方式更新您的问题吗?通常,预测与训练/验证结果不匹配的一个常见原因是洗牌测试集。默认情况下,FLOW*函数是无序的。

@Djinn here is the entire code colab.research.google.com/drive/…

@djinn这里是完整的代码colab.research.google.com/Drive/…

As I figured, you're shuffling your test data, so they won't match their labels. If you're going to use ImageDataGenerator.flow_from_directory() with your test data, you need the parameter shuffle=False when you define your test_generator.

正如我所想的那样,您正在打乱您的测试数据,这样它们就不会匹配它们的标签。如果要对测试数据使用ImageDataGenerator.flow_from_directory(),则在定义测试生成器时需要使用参数Shuffle=False。

优秀答案推荐

From your code, change:

从您的代码中,更改:


test_generator=train_datagen.flow_from_directory(
locat_testing,
class_mode='binary',
color_mode='grayscale',
batch_size=32,
target_size=(img_size,img_size)
)

To include the shuffle parameter:

要包括随机播放参数,请执行以下操作:


test_generator=train_datagen.flow_from_directory(
locat_testing,
class_mode='binary',
color_mode='grayscale',
batch_size=32,
target_size=(img_size,img_size),
shuffle=False
)

Your confusion matrix will look a lot more accurate instead of what looks like randomly guessing.

你的困惑矩阵看起来会更准确,而不是看起来像是随机猜测。



Here is the code to predict the accuracy, confusion matrix and classification report

以下是预测准确度、混淆矩阵和分类报告的代码


def predictor(test_gen):    
y_pred= []
error_list=[]
error_pred_list = []
y_true=test_gen.labels
classes=list(test_gen.class_indices.keys())
class_count=len(classes)
errors=0
preds=model.predict(test_gen, verbose=1)
tests=len(preds)
for i, p in enumerate(preds):
pred_index=np.argmax(p)
true_index=test_gen.labels[i] # labels are integer values
if pred_index != true_index: # a misclassification has occurred
errors=errors + 1
file=test_gen.filenames[i]
error_list.append(file)
error_class=classes[pred_index]
error_pred_list.append(error_class)
y_pred.append(pred_index)

acc=( 1-errors/tests) * 100
msg=f'there were {errors} errors in {tests} tests for an accuracy of {acc:6.2f}'
print(msg)
ypred=np.array(y_pred)
ytrue=np.array(y_true)
f1score=f1_score(ytrue, ypred, average='weighted')* 100
if class_count <=30:
cm = confusion_matrix(ytrue, ypred )
# plot the confusion matrix
plt.figure(figsize=(12, 8))
sns.heatmap(cm, annot=True, vmin=0, fmt='g', cmap='Blues', cbar=False)
plt.xticks(np.arange(class_count)+.5, classes, rotation=90)
plt.yticks(np.arange(class_count)+.5, classes, rotation=0)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
clr = classification_report(y_true, y_pred, target_names=classes, digits= 4) # create classification report
print("Classification Report:\n----------------------\n", clr)
return errors, tests, error_list, error_pred_list, f1score

errors, tests, error_list, error_pred_list, f1score =predictor(test_gen)

# print out list of test files misclassified if less than 50 errors

if len(error_list) > 0 and len(error_list)<50:
print ('Below is a list of test files that were miss classified \n')
print ('{0:^30s}{1:^30s}'.format('Test File', ' Predicted as'))
sorted_list=sorted(error_list)
for i in range(len(sorted_list)):
fpath=sorted_list[i]
split=fpath.split('\\')
f=split[4]+ '-' + split[5]
print(f'{f:^30s}{error_pred_list[i]:^30s}')


# print out list of test files misclassified if less than 50 errors

if len(error_list) > 0 and len(error_list)<50:
print ('Below is a list of test files that were miss classified \n')
print ('{0:^30s}{1:^30s}'.format('Test File', ' Predicted as'))
sorted_list=sorted(error_list)
for i in range(len(sorted_list)):
fpath=sorted_list[i]
split=fpath.split('\\')
f=split[2]+ '-' + split[2]
print(f'{f:^30s}{error_pred_list[i]:^30s}')

更多回答

OP's problem isn't creating a confusion matrix, it's passing correct (sorted) values. Their confusion matrix is created (in application) fine, the values aren't.

OP的问题不是创建混淆矩阵,而是传递正确的(排序的)值。他们的混淆矩阵(在应用中)被创建得很好,值却不是。

/help/formatting

/Help/Formatting

Hello, please include some explantation as to why you think this is the optimal solution.

您好,请解释一下为什么您认为这是最佳解决方案。

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com