python - predict_proba 或 decision_function 作为估计器 "confidence"-6ren

python - predict_proba 或 decision_function 作为估计器 "confidence"

转载作者：太空狗更新时间：2023-10-29 17:49:02

29

4

我使用 LogisticRegression 作为模型来训练 scikit-learn 中的估算器。我使用的特征(大部分)是分类的；标签也是如此。因此，我分别使用 DictVectorizer 和 LabelEncoder 对值进行正确编码。

训练部分相当简单，但我在测试部分遇到了问题。简单的做法是使用训练模型的“预测”方法并获得预测标签。但是，对于我之后需要做的处理，我需要每个特定实例的每个可能标签(类)的概率。我决定使用“predict_proba”方法。但是，对于同一个测试实例，我得到不同的结果，无论我是在实例单独使用还是与其他实例一起使用时使用此方法。

接下来是重现问题的代码。

from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import LabelEncoder


X_real = [{'head': u'n\xe3o', 'dep_rel': u'ADVL'}, 
          {'head': u'v\xe3o', 'dep_rel': u'ACC'}, 
          {'head': u'empresa', 'dep_rel': u'SUBJ'}, 
          {'head': u'era', 'dep_rel': u'ACC'}, 
          {'head': u't\xeam', 'dep_rel': u'ACC'}, 
          {'head': u'import\xe2ncia', 'dep_rel': u'PIV'}, 
          {'head': u'balan\xe7o', 'dep_rel': u'SUBJ'}, 
          {'head': u'ocupam', 'dep_rel': u'ACC'}, 
          {'head': u'acesso', 'dep_rel': u'PRED'}, 
          {'head': u'elas', 'dep_rel': u'SUBJ'}, 
          {'head': u'assinaram', 'dep_rel': u'ACC'}, 
          {'head': u'agredido', 'dep_rel': u'SUBJ'}, 
          {'head': u'pol\xedcia', 'dep_rel': u'ADVL'}, 
          {'head': u'se', 'dep_rel': u'ACC'}] 
y_real = [u'AM-NEG', u'A1', u'A0', u'A1', u'A1', u'A1', u'A0', u'A1', u'AM-ADV', u'A0', u'A1', u'A0', u'A2', u'A1']

feat_encoder =  DictVectorizer()
feat_encoder.fit(X_real)

label_encoder = LabelEncoder()
label_encoder.fit(y_real)

model = LogisticRegression()
model.fit(feat_encoder.transform(X_real), label_encoder.transform(y_real))

print "Test 1..."
X_test1 = [{'head': u'governo', 'dep_rel': u'SUBJ'}]
X_test1_encoded = feat_encoder.transform(X_test1)
print "Features Encoded"
print X_test1_encoded
print "Shape"
print X_test1_encoded.shape
print "decision_function:"
print model.decision_function(X_test1_encoded)
print "predict_proba:"
print model.predict_proba(X_test1_encoded)

print "Test 2..."
X_test2 = [{'head': u'governo', 'dep_rel': u'SUBJ'}, 
           {'head': u'atrav\xe9s', 'dep_rel': u'ADVL'}, 
           {'head': u'configuram', 'dep_rel': u'ACC'}]

X_test2_encoded = feat_encoder.transform(X_test2)
print "Features Encoded"
print X_test2_encoded
print "Shape"
print X_test2_encoded.shape
print "decision_function:"
print model.decision_function(X_test2_encoded)
print "predict_proba:"
print model.predict_proba(X_test2_encoded)


print "Test 3..."
X_test3 = [{'head': u'governo', 'dep_rel': u'SUBJ'}, 
           {'head': u'atrav\xe9s', 'dep_rel': u'ADVL'}, 
           {'head': u'configuram', 'dep_rel': u'ACC'},
           {'head': u'configuram', 'dep_rel': u'ACC'},]

X_test3_encoded = feat_encoder.transform(X_test3)
print "Features Encoded"
print X_test3_encoded
print "Shape"
print X_test3_encoded.shape
print "decision_function:"
print model.decision_function(X_test3_encoded)
print "predict_proba:"
print model.predict_proba(X_test3_encoded)

得到的输出如下:

Test 1...
Features Encoded
  (0, 4)    1.0
Shape
(1, 19)
decision_function:
[[ 0.55372615 -1.02949707 -1.75474347 -1.73324726 -1.75474347]]
predict_proba:
[[ 1.  1.  1.  1.  1.]]
Test 2...
Features Encoded
  (0, 4)    1.0
  (1, 1)    1.0
  (2, 0)    1.0
Shape
(3, 19)
decision_function:
[[ 0.55372615 -1.02949707 -1.75474347 -1.73324726 -1.75474347]
 [-1.07370197 -0.69103629 -0.89306092 -1.51402163 -0.89306092]
 [-1.55921001  1.11775556 -1.92080112 -1.90133404 -1.92080112]]
predict_proba:
[[ 0.59710757  0.19486904  0.26065002  0.32612646  0.26065002]
 [ 0.23950111  0.24715931  0.51348452  0.3916478   0.51348452]
 [ 0.16339132  0.55797165  0.22586546  0.28222574  0.22586546]]
Test 3...
Features Encoded
  (0, 4)    1.0
  (1, 1)    1.0
  (2, 0)    1.0
  (3, 0)    1.0
Shape
(4, 19)
decision_function:
[[ 0.55372615 -1.02949707 -1.75474347 -1.73324726 -1.75474347]
 [-1.07370197 -0.69103629 -0.89306092 -1.51402163 -0.89306092]
 [-1.55921001  1.11775556 -1.92080112 -1.90133404 -1.92080112]
 [-1.55921001  1.11775556 -1.92080112 -1.90133404 -1.92080112]]
predict_proba:
[[ 0.5132474   0.12507868  0.21262531  0.25434403  0.21262531]
 [ 0.20586462  0.15864173  0.4188751   0.30544372  0.4188751 ]
 [ 0.14044399  0.3581398   0.1842498   0.22010613  0.1842498 ]
 [ 0.14044399  0.3581398   0.1842498   0.22010613  0.1842498 ]]

可以看出，当同一实例与 X_test2 中的其他实例一起使用时，“X_test1”中的实例使用“predict_proba”获得的值会发生变化。此外，“X_test3”只是复制了“X_test2”并添加了一个实例(等于“X_test2”中的最后一个实例)，但所有实例的概率值都发生了变化。为什么会这样？另外，我觉得很奇怪，“X_test1”的所有概率都是1，总和不应该是1吗？

现在，如果我不使用“predict_proba”而是使用“decision_function”，我将获得所需的值的一致性。问题是我得到负系数，甚至一些正系数大于 1。

那么，我应该使用什么？为什么“predict_proba”的值会那样改变？我是否没有正确理解这些值的含义？

在此先感谢您能给我的任何帮助。

更新

按照建议，我更改了代码以便打印编码的“X_test1”、“X_test2”和“X_test3”以及它们的形状。这似乎不是问题，因为测试集之间的相同实例的编码是一致的。

最佳答案

如问题评论中所示，该错误是由我使用的 scikit-learn 版本的实现中的错误引起的。问题解决更新到最新的稳定版0.12.1

关于python - predict_proba 或 decision_function 作为估计器 "confidence"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13301986/

29

4

0

文章推荐： json - 如何在 Angular 中基于外部 NgFor 建立内部 NgFor

文章推荐： python - 为什么 Django 的 `urlencode` 不编码斜杠？

文章推荐： python - RESTful API 和谷歌分析

Python Predict_proba 类识别
假设我的标记数据有两个类 1 和 0。当我在测试集上运行 Predict_proba 时，它返回一个包含两列的数组。哪一列对应哪个类？最佳答案第 0 列对应于类 0，第 1 列对应于类 1。关于
python - 回归与分类器 predict_proba
只是一个简单的问题，如果我想将对象分类为 0 或 1，但我希望模型返回一个“可能性”概率，例如，如果一个对象是 0.7，这意味着它有 0.7 的机会进入第 1 类，我是做回归还是坚持使用分类器并使用
python - predict_proba 用于交叉验证模型
我想通过交叉验证从逻辑回归模型预测概率。我知道您可以获得交叉验证分数，但是否可以从 predict_proba 返回值而不是分数？ # imports from sklearn.linear_mode
python - sklearn Predict_proba 不匹配类标签
我在我的数据集上训练了一个 RandomForestClassifier，可以从文本正文中预测 8 个不同的主题。对于给定示例，数据集如下所示 X_train = [[0,0,0,0,0,1,0,0,
python - 总是百分百的概率 : predict_proba, sklearn
我正在使用 Python 的 sklearn 对文本进行分类。我调用函数 predict_proba，它看起来像这样: [[ 6.74918834e-53 1.59981248e-51 2
python - 为什么 predict_proba 函数以相反的顺序打印概率？
我正在使用 scikit-learn 通过逻辑回归来实现分类。使用 predict() 函数预测类标签，而使用 predict_proba() 函数打印预测概率。下面粘贴了代码片段: # Parti
python - 随机森林 : predict vs predict_proba
我正在处理一个多类、高度不平衡的分类问题。我使用随机森林作为基础分类器。我必须在考虑多个标准(指标:精度、召回 conf_matrix、roc_auc)的情况下给出模型性能报告。模型火车: rf
machine-learning - XGBoost predict_proba 推理性能慢
我使用 Scikit-learn 和 XGBoost 在同一数据上训练了 2 个梯度提升模型。 Scikit-learn 模型 GradientBoostingClassifier( n_es
python - DecisionTreeRegressor 的 Predict_proba 的等效项
scikit-learn 的 DecisionTreeClassifier 支持通过 predict_proba() 函数预测每个类的概率。 DecisionTreeRegressor 中不存在这一点
Python:如何解释和改进 RandomForest 中的 Predict_proba()
所以我使用 sci-kit learns RandomForestClassifier 将天文来源的数据分为三类。为了让我的问题更简单，我在测试集中仅使用了两个来源，并获得了 predict_prob
python - 带有 predict_proba 的 SGDClassifier
我正在使用 sklearn 库来训练和测试我的数据。 targetDataCsv = pd.read_csv("target.csv","rt")) testNormalizedCsv = csv.
python - Keras 模型的 predict_proba() 方法不存在
我试图通过调用 Keras 模型的 predict_proba() 生成类(class)分数，但似乎没有这个函数!它是否因为我在谷歌中看到一些例子而被弃用？我正在使用 Keras 2.2.2。最佳答
python - Predict_proba 不适用于我的高斯混合模型(sklearn，python)
运行Python 3.7.3 我制作了一个简单的 GMM 并将其拟合到一些数据。使用predict_proba方法，返回的是1和0，而不是属于每个高斯的输入的概率。我最初在更大的数据集上尝试过这个，
python - Keras，模型 predict_proba 的输出
在docs , predict_proba(self, x, batch_size=32, verbose=1) 是 Generates class probability predictions f
python - AttributeError :'LinearSVC' 对象没有属性 'predict_proba'
我正在尝试使用LinearSVC 分类器更新:添加了导入 import nltk from nltk.tokenize import word_tokenize from nltk.classify
python - Scikit-learn predict_proba 给出错误答案
这是来自 How to know what classes are represented in return array from predict_proba in Scikit-learn 的后续
python - 如何在 clf.predict_proba() 中找到对应的类
我有许多类和对应的特征向量，当我运行 predict_proba() 时，我会得到这个: classes = ['one','two','three','one','three'] feature =
python - sklearn 的 MLP predict_proba 函数在内部是如何工作的？
我正在尝试了解如何 sklearn's MLP Classifier检索其 predict_proba 函数的结果。该网站仅列出: Probability estimates 还有很多其他的，例如
tensorflow - 属性错误 : 'Sequential' object has no attribute 'predict_proba'
predict_proba 返回神经网络中的误差我在这个链接上看到了例子 https://machinelearningmastery.com/how-to-make-classification-
python - 为什么随机森林分类器 .predict() 和 .predict_proba() 的预测不匹配？
我训练了一个简单的随机森林分类器，然后当我使用相同的测试输入测试预测时: rf_clf.predict([[50,0,500,0,20,0,250000,1.5,110,0,0,2]]) rf_clf

首页

博学

6Ren·AI

商城

python - predict_proba 或 decision_function 作为估计器 "confidence"