gpt4 book ai didi

python - 如何以概率输出 Shap 值并从二元分类器制作 force_plot

转载 作者:行者123 更新时间:2023-12-05 01:52:51 25 4
gpt4 key购买 nike

我需要绘制每个特征如何影响我的 LightGBM 二元分类器中每个样本的预测概率。所以我需要输出概率的 Shap 值,而不是正常的 Shap 值。它似乎没有任何概率输出选项。

下面的示例代码是我用来生成 Shap 值的数据帧并为第一个数据样本执行 force_plot 的代码。有谁知道我应该如何修改代码来改变输出?我是 Shap 值和 Shap 包的新手。非常感谢。

import pandas as pd
import numpy as np
import shap
import lightgbm as lgbm
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = lgbm.LGBMClassifier()
model.fit(X_train, y_train)


explainer = shap.TreeExplainer(model)
shap_values = explainer(X_train)

# force plot of first row for class 1
class_idx = 1
row_idx = 0
expected_value = explainer.expected_value[class_idx]
shap_value = shap_values[:,:,class_idx].values[row_idx]

shap.force_plot (base_value = expected_value, shap_values = shap_value, features = X_train.iloc[row_idx, :], matplotlib=True)

# dataframe of shap values for class 1
shap_df = pd.DataFrame(shap_values[:,:, 1 ].values, columns = shap_values.feature_names)

最佳答案

长话短说:

您可以在force_plot 方法中使用link="logit" 在概率空间中绘制结果:

import pandas as pd
import numpy as np
import shap
import lightgbm as lgbm
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from scipy.special import expit

shap.initjs()

data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

model = lgbm.LGBMClassifier()
model.fit(X_train, y_train)

explainer_raw = shap.TreeExplainer(model)
shap_values = explainer_raw(X_train)

# force plot of first row for class 1
class_idx = 1
row_idx = 0
expected_value = explainer_raw.expected_value[class_idx]
shap_value = shap_values[:, :, class_idx].values[row_idx]

shap.force_plot(
base_value=expected_value,
shap_values=shap_value,
features=X_train.iloc[row_idx, :],
link="logit",
)

预期输出:

enter image description here

或者,您可以通过以下方式实现相同的效果,明确指定您有兴趣解释的 model_output="probability":

explainer = shap.TreeExplainer(
model,
data=X_train,
feature_perturbation="interventional",
model_output="probability",
)
shap_values = explainer(X_train)

# force plot of first row for class 1
class_idx = 1
row_idx = 0

shap_value = shap_values.values[row_idx]

shap.force_plot(
base_value=expected_value,
shap_values=shap_value,
features=X_train.iloc[row_idx, :]
)

预期输出:

enter image description here

但是,了解这些数字的来源可能更有趣:

  1. 兴趣点的目标概率:
model_proba= model.predict_proba(X_train.iloc[[row_idx]])
model_proba
# array([[0.00275887, 0.99724113]])
  1. X_train 作为背景的模型的原始基本案例(注意,LightGBM 为类 1 输出原始数据):
model.predict(X_train, raw_score=True).mean()
# 2.4839751932445577
  1. 来自 SHAP 的原始基本案例(注意,它们是对称的):
bv = explainer_raw(X_train).base_values[0]
bv
# array([-2.48397519, 2.48397519])
  1. 兴趣点的原始 SHAP 值:
sv_0 = explainer_raw(X_train).values[row_idx].sum(0)
sv_0
# array([-3.40619584, 3.40619584])
  1. SHAP 值(通过 sigmoid)推断的 Proba:
shap_proba = expit(bv + sv_0)
shap_proba
# array([0.00275887, 0.99724113])
  1. 检查:
assert np.allclose(model_proba, shap_proba)

有什么不明白的地方请提问。

边注

Proba might be misleading if you're analyzing raw size effect of different features because sigmoid is non-linear and saturates after reaching certain threshold.

Some people expect to see SHAP values in probability space as well, but this is not feasible because:

  • SHAP values are additive by construction (to be precise SHapley Additive exPlanations are average marginal contributions over all possible feature coalitions)
  • exp(a + b) != exp(a) + exp(b)

您可能会发现有用:

  1. 二元分类中的特征重要性和仅提取其中一个类的 SHAP 值 answer

  2. 使用 SHAP 时如何解释 GBT 分类器的 base_value? answer

关于python - 如何以概率输出 Shap 值并从二元分类器制作 force_plot,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71446065/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com