python - 完美的精确度、召回率和 f1 分数，但预测不佳-6ren

python - 完美的精确度、召回率和 f1 分数，但预测不佳

转载作者：行者123 更新时间：2023-12-01 01:26:35

使用 scikit-learn 对二元问题进行分类。获得完美的classification_report(均为 1)。然而预测结果为 0.36。怎么可能？

我熟悉不平衡的标签。但我认为情况并非如此，因为 f1 和其他分数列以及混淆矩阵表示完美分数。

# Set aside the last 19 rows for prediction.
X1, X_Pred, y1, y_Pred = train_test_split(X, y, test_size= 19, 
                shuffle = False, random_state=None)

X_train, X_test, y_train, y_test = train_test_split(X1, y1, 
         test_size= 0.4, stratify = y1, random_state=11)

clcv = DecisionTreeClassifier()
scorecv = cross_val_score(clcv, X1, y1, cv=StratifiedKFold(n_splits=4), 
                         scoring= 'f1') # to balance precision/recall
clcv.fit(X1, y1)
y_predict = clcv.predict(X1)
cm = confusion_matrix(y1, y_predict)
cm_df = pd.DataFrame(cm, index = ['0','1'], columns = ['0','1'] )
print(cm_df)
print(classification_report( y1, y_predict ))
print('Prediction score:', clcv.score(X_Pred, y_Pred)) # unseen data

输出:

confusion:
      0   1
0  3011   0
1     0  44

              precision    recall  f1-score   support
       False       1.00      1.00      1.00      3011
        True       1.00      1.00      1.00        44

   micro avg       1.00      1.00      1.00      3055
   macro avg       1.00      1.00      1.00      3055
weighted avg       1.00      1.00      1.00      3055

Prediction score: 0.36

最佳答案

问题在于您过度拟合。

有很多代码没有使用，所以让我们修剪一下:

# Set aside the last 19 rows for prediction.
X1, X_Pred, y1, y_Pred = train_test_split(X, y, test_size= 19, 
                shuffle = False, random_state=None)

clcv = DecisionTreeClassifier()
clcv.fit(X1, y1)
y_predict = clcv.predict(X1)
cm = confusion_matrix(y1, y_Pred)
cm_df = pd.DataFrame(cm, index = ['0','1'], columns = ['0','1'] )
print(cm_df)
print(classification_report( y1, y_Pred ))
print('Prediction score:', clcv.score(X_Pred, y_Pred)) # unseen data

很明显，这里没有交叉验证，预测分数低的明显原因是决策树分类器的过度拟合。

使用交叉验证的分数，您应该可以直接看到问题所在。

关于python - 完美的精确度、召回率和 f1 分数，但预测不佳，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53278489/