gpt4 book ai didi

python - 从混淆矩阵创建(有效)假真值/预测值

转载 作者:太空宇宙 更新时间:2023-11-03 17:45:25 26 4
gpt4 key购买 nike

出于测试目的,我需要从混淆矩阵创建虚假的真实值/预测值。

我的混淆矩阵使用以下方式存储在 Pandas DataFrame 中:

labels = ['N', 'L', 'R', 'A', 'P', 'V']
df = pd.DataFrame([
[1971, 19, 1, 8, 0, 1],
[16, 1940, 2, 23, 9, 10],
[8, 3, 181, 87, 0, 11],
[2, 25, 159, 1786, 16, 12],
[0, 24, 4, 8, 1958, 6],
[11, 12, 29, 11, 11, 1926] ], columns=labels, index=labels)
df.index.name = 'Actual'
df.columns.name = 'Predicted'

我假设索引是实际值,列是预测值。

这个混淆矩阵看起来像:

Predicted     N     L    R     A     P     V
Actual
N 1971 19 1 8 0 1
L 16 1940 2 23 9 10
R 8 3 181 87 0 11
A 2 25 159 1786 16 12
P 0 24 4 8 1958 6
V 11 12 29 11 11 1926

我正在寻找一种有效的方法来创建 2 个 Numpy 数组:y_truey_predict,这将产生这样一个混淆矩阵。

我的第一个想法是首先创建大小合适的 Numpy 数组。

所以我做到了:

N_all = df.sum().sum()

y_true = np.empty(N_all)
y_pred = np.empty(N_all)

但我不知道如何有效地填充这2个Numpy数组

相同的代码也应该适用于二进制混淆矩阵,例如:

labels = [False, True]
df = pd.DataFrame([
[5, 3],
[2, 7]], columns=labels, index=labels)
df.index.name = 'Actual'
df.columns.name = 'Predicted'

这个二元混淆矩阵看起来像:

Predicted  False  True
Actual
False 5 3
True 2 7

最佳答案

如果您想完全重新创建,可以使用以下函数:

def create_arrays(df):
# Unstack to make tuples of actual,pred,count
df = df.unstack().reset_index()

# Pull the value labels and counts
actual = df['Actual'].values
predicted = df['Predicted'].values
totals = df.iloc[:,2].values

# Use list comprehension to create original arrays
y_true = [[curr_val]*n for (curr_val, n) in zip(actual, totals)]
y_predicted = [[curr_val]*n for (curr_val, n) in zip(predicted, totals)]

# They come nested so flatten them
y_true = [item for sublist in y_true for item in sublist]
y_predicted = [item for sublist in y_predicted for item in sublist]

return y_true, y_predicted

我们可以检查这是否产生了所需的结果:

import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix

labels = ['N', 'L', 'R', 'A', 'P', 'V']
df = pd.DataFrame([
[1971, 19, 1, 8, 0, 1],
[16, 1940, 2, 23, 9, 10],
[8, 3, 181, 87, 0, 11],
[2, 25, 159, 1786, 16, 12],
[0, 24, 4, 8, 1958, 6],
[11, 12, 29, 11, 11, 1926] ], columns=labels, index=labels)
df.index.name = 'Actual'
df.columns.name = 'Predicted'

# Recreate the original confusion matrix and check for equality
y_t, y_p = create_arrays(df)
conf_mat = confusion_matrix(y_t,y_p)
check_labels = np.unique(y_t)

df_new = pd.DataFrame(conf_mat, columns=check_labels, index=check_labels).loc[labels, labels]
df_new.index.name = 'Actual'
df_new.columns.name = 'Predicted'

df == df_new

输出:

Predicted     N     L     R     A     P     V
Actual
N True True True True True True
L True True True True True True
R True True True True True True
A True True True True True True
P True True True True True True
V True True True True True True

对于二进制文件:

# And for the binary
labels = ['False', 'True']
df = pd.DataFrame([
[5, 3],
[2, 7]], columns=labels, index=labels)
df.index.name = 'Actual'
df.columns.name = 'Predicted'

# Recreate the original confusion matrix and check for equality
y_t, y_p = create_arrays(df)
conf_mat = confusion_matrix(y_t,y_p)
check_labels = np.unique(y_t)

df_new = pd.DataFrame(conf_mat, columns=check_labels, index=check_labels).loc[labels, labels]
df_new.index.name = 'Actual'
df_new.columns.name = 'Predicted'

df == df_new

Predicted False True
Actual
False True True
True True True

关于python - 从混淆矩阵创建(有效)假真值/预测值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29882747/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com