gpt4 book ai didi

python - 随机森林的 TreeExplainer 的 expected_value 字段是什么?

转载 作者:行者123 更新时间:2023-12-05 01:37:58 27 4
gpt4 key购买 nike

我用 SHAP 来解释我的 RF

RF_best_parameters = RandomForestRegressor(random_state=24, n_estimators=100)
RF_best_parameters.fit(X_train, y_train.values.ravel())
shap_explainer_model = shap.TreeExplainer(RF_best_parameters)

TreeExplainer 类有一个属性 expected_value。根据 X_train,我的第一个猜测是这个字段是预测 y 的平均值(我也读过这个 here)

但事实并非如此。
命令的输出:

shap_explainer_model.expected_value

为 0.2381。

命令的输出:

RF_best_parameters.predict(X_train).mean()

为 0.2389。

正如我们所看到的,这些值是不一样的。那么这里的expected_value是什么意思呢?

最佳答案

这是由于与随机森林算法一起使用时该方法的特殊性;引用相关 Github 线程中的响应 shap explainer expected_value is different from model expected value :

It is because of how sklearn records the training samples in the tree models it builds. Random forests use a random subsample of the data to train each tree, and it is that random subsample that is used in sklearn to record the leaf sample weights in the model. Since TreeExplainer uses the recorded leaf sample weights to represent the training dataset, it will depend on the random sampling used during training. This will cause small variations like the ones you are seeing.

我们实际上可以验证其他算法不存在这种行为,比如梯度提升树:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
import numpy as np

import shap
shap.__version__
# 0.37.0

X, y = make_regression(n_samples=1000, n_features=10, random_state=0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

gbt = GradientBoostingRegressor(random_state=0)
gbt.fit(X_train, y_train)

mean_pred_gbt = np.mean(gbt.predict(X_train))
mean_pred_gbt
# -11.534353657511172

gbt_explainer = shap.TreeExplainer(gbt)
gbt_explainer.expected_value
# array([-11.53435366])

np.isclose(mean_pred_gbt, gbt_explainer.expected_value)
# array([ True])

但对于 RF,我们确实得到了上面线程中主要 SHAP 开发人员提到的“小变化”:

rf = RandomForestRegressor(random_state=0)
rf.fit(X_train, y_train)

rf_explainer = shap.TreeExplainer(rf)
rf_explainer.expected_value
# array([-11.59166808])

mean_pred_rf = np.mean(rf.predict(X_train))
mean_pred_rf
# -11.280125877556388

np.isclose(mean_pred_rf, rf_explainer.expected_value)
# array([False])

关于python - 随机森林的 TreeExplainer 的 expected_value 字段是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60311847/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com