python - 理解梯度提升回归树的部分依赖-6ren

python - 理解梯度提升回归树的部分依赖

转载作者：行者123 更新时间：2023-12-04 23:37:29

26

4

我在看 tutorial用于 Python 中的部分依赖图。教程或 documentation 中没有给出方程. documentation R 函数给出了我期望的公式:

这对于 Python 教程中给出的结果似乎没有意义。如果是房价预测的平均值，那怎么是负的小呢？我希望值(value)数以百万计。我错过了什么吗？

更新:

对于回归，似乎从上述公式中减去平均值。这个怎么加回来？对于我训练有素的型号我可以通过

from sklearn.ensemble.partial_dependence import partial_dependence
partial_dependence, independent_value = partial_dependence(model, features.index(independent_feature),X=df2[features])

我想将 (?) 加回平均值。我可以通过对 df2 值使用 model.predict() 来获得这个吗？

最佳答案

R 公式的工作原理
r问题中提出的公式适用于 randomForest .随机森林中的每棵树都试图直接预测目标变量。因此，每棵树的预测都在预期区间内(在您的情况下，所有房价均为正)，而整体预测只是所有单个预测的平均值。

ensemble_prediction = mean(tree_predictions)

这就是公式告诉您的:只需对所有树进行预测 x并平均它们。

为什么 Python PDP 值很小

在 sklearn ，然而，部分依赖是为 GradientBoostingRegressor 计算的.在梯度提升中，每棵树都预测了当前预测时损失函数的导数，该导数仅与目标变量间接相关。对于 GB 回归，预测为

ensemble_prediction = initial_prediction + sum(tree_predictions * learning_rate)

对于 GB 分类预测概率是

ensemble_prediction = softmax(initial_prediction + sum(tree_predictions * learning_rate))

对于这两种情况，部分依赖报告为只是

sum(tree_predictions * learning_rate)

因此，initial_prediction(对于 GradientBoostingRegressor(loss='ls')，它仅等于训练的平均值 y)没有包含在 PDP 中，这使得预测为负。

至于其值的小范围， y_train在你的例子中很小:平均房屋值(value)大约是 2 ，因此房价可能以百万表示。

sklearn 公式实际上是如何工作的

我已经在 sklearn中说过了部分依赖函数的值是所有树的平均值。还有一个调整:所有不相关的特征都被平均掉了。为了描述平均的实际方式，我将引用 the documentation sklearn:

For each value of the ‘target’ features in the grid the partial dependence function need to marginalize the predictions of a tree over all possible values of the ‘complement’ features. In decision trees this function can be evaluated efficiently without reference to the training data. For each grid point a weighted tree traversal is performed: if a split node involves a ‘target’ feature, the corresponding left or right branch is followed, otherwise both branches are followed, each branch is weighted by the fraction of training samples that entered that branch. Finally, the partial dependence is given by a weighted average of all visited leaves. For tree ensembles the results of each individual tree are again averaged.

如果您仍然不满意，请参阅 the source code .

一个例子

要查看预测已经在因变量的范围内(但只是居中)，您可以查看一个非常简单的示例:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble.partial_dependence import plot_partial_dependence

np.random.seed(1)
X = np.random.normal(size=[1000, 2])
# yes, I will try to fit a linear function!
y = X[:, 0] * 10 + 50 + np.random.normal(size=1000, scale=5) 
# mean target is 50, range is from 20 to 80, that is +/- 30 standard deviations 
model = GradientBoostingRegressor().fit(X, y)

fig, subplots = plot_partial_dependence(model, X, [0, 1], percentiles=(0.0, 1.0), n_cols=2)
subplots[0].scatter(X[:, 0], y - y.mean(), s=0.3)
subplots[1].scatter(X[:, 1], y - y.mean(), s=0.3)
plt.suptitle('Partial dependence plots and scatters of centered target')
plt.show()

您可以看到部分依赖图很好地反射(reflect)了中心目标变量的真实分布。

如果您不仅希望单位，而且希望与您的 y 一致，您必须将“丢失”均值添加到 partial_dependence 的结果中函数，然后手动绘制结果:

from sklearn.ensemble.partial_dependence import partial_dependence
pdp_y, [pdp_x] = partial_dependence(model, X=X, target_variables=[0], percentiles=(0.0, 1.0))
plt.scatter(X[:, 0], y, s=0.3)
plt.plot(pdp_x, pdp_y.ravel() + model.init_.mean)
plt.show()
plt.title('Partial dependence plot in the original coordinates');

关于python - 理解梯度提升回归树的部分依赖，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49247796/

26

4

0

文章推荐： haxe - Haxe 中有某种退出语句吗？

文章推荐： angular5 - Agm标记标签

文章推荐： r - gsub 的否定 |替换某个向量中除字符串之外的所有内容

typescript - A 部分部分 io-ts
我在使用 io-ts 时遇到一些问题。我发现它确实缺乏文档，我取得的大部分进展都是通过 GitHub issues 取得的。不，我不明白 HKT，所以没有帮助。基本上，我在其他地方创建一个类型，ty
java - 匹配完整文件正则表达式中的 A 部分，但不匹配 B 部分
我必须创建一个正则表达式来搜索整个文件，以找到与 Java XML 解析器的第一部分(但不是第二部分)的匹配项。这将用于防止某些 XXE 攻击。不幸的是，它确实必须是单个正则表达式，并且它确实需要搜索
c# - 部分/部分中的 asp.net mvs 部分？
我有一些简单的 Shared/_Header.cshtml 文件中的内容。 My Shared/_Layout.cshtml 通过调用插入该代码 @Html.Partial("_Header") 目前
java - Selenium 只执行循环的 if != null 部分，不运行循环的 "else if null "部分
我有一个 if-else 语句，其中: 条件 1:ID 匹配并且自动填充某些字段。然后 if 语句只填充其余字段条件 2:ID 不匹配，所有字段均为空白。 ELSE 语句将它们全部填充当我使条件
javascript - 无法在 JSFIDDLE 中使用滚动魔法(第 1 部分，共 2 部分)
我正在开发一个单页滚动网站。我正在尝试实现 ScrollMagic 并固定第一部分，以便网站的其余部分滚动到固定部分的顶部。我尝试创建一个 jsfiddle 来显示问题，但我似乎无法让 jsfiddl
javascript - 既然有

首页

博学

6Ren·AI

商城

python - 理解梯度提升回归树的部分依赖