gpt4 book ai didi

python - sklearn 中的 log_loss : Multioutput target data is not supported with label binarization

转载 作者:行者123 更新时间:2023-11-30 22:21:52 25 4
gpt4 key购买 nike

以下代码

from sklearn import metrics
import numpy as np
y_true = np.array([[0.2,0.8,0],[0.9,0.05,0.05]])
y_predict = np.array([[0.5,0.5,0.0],[0.5,0.4,0.1]])
metrics.log_loss(y_true, y_predict)

产生以下错误:

   ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-32-24beeb19448b> in <module>()
----> 1 metrics.log_loss(y_true, y_predict)

~\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\sklearn\metrics\classification.py in log_loss(y_true, y_pred, eps, normalize, sample_weight, labels)
1646 lb.fit(labels)
1647 else:
-> 1648 lb.fit(y_true)
1649
1650 if len(lb.classes_) == 1:

~\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\sklearn\preprocessing\label.py in fit(self, y)
276 self.y_type_ = type_of_target(y)
277 if 'multioutput' in self.y_type_:
--> 278 raise ValueError("Multioutput target data is not supported with "
279 "label binarization")
280 if _num_samples(y) == 0:

ValueError: Multioutput target data is not supported with label binarization

我很好奇为什么。我正在尝试重新阅读对数损失的定义,但找不到任何会使计算不正确的内容。

最佳答案

源代码表明metrics.log_loss不支持 y_true 中的概率。它仅支持形状 (n_samples, n_classes) 的二进制指示符,例如 [[0,0,1],[1,0,0]] 或类标签形状(n_samples,),例如[2, 0]。在后一种情况下,在计算对数损失之前,类标签将被单热编码以看起来像指示矩阵。

在此 block 中:

lb = LabelBinarizer()

if labels is not None:
lb.fit(labels)
else:
lb.fit(y_true)

您正在达到 lb.fit(y_true),如果 y_true 不全部为 1 和/或 0,则会失败。例如:

>>> import numpy as np
>>> from sklearn import preprocessing

>>> lb = preprocessing.LabelBinarizer()

>>> lb.fit(np.array([[0,1,0],[1,0,0]]))

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

>>> lb.fit(np.array([[0.2,0.8,0],[0.9,0.05,0.05]]))

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/imran/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/sklearn/preprocessing/label.py", line 278, in fit
raise ValueError("Multioutput target data is not supported with "
ValueError: Multioutput target data is not supported with label binarization

我会定义您自己的自定义对数损失函数:

def logloss(y_true, y_pred, eps=1e-15):
y_pred = np.clip(y_pred, eps, 1 - eps)
return -(y_true * np.log(y_pred)).sum(axis=1).mean()

这是数据的输出:

>>> logloss(y_true, y_predict)
0.738961717153653

关于python - sklearn 中的 log_loss : Multioutput target data is not supported with label binarization,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48504914/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com