gpt4 book ai didi

python - 为什么 OLS 回归的 `sklearn` 和 `statsmodels` 实现给出不同的 R^2?

转载 作者:太空狗 更新时间:2023-10-29 20:31:47 26 4
gpt4 key购买 nike

无意中我注意到,sklearnstatsmodels 实现的 OLS 模型在不拟合截距时会产生不同的 R^2 值。否则他们似乎工作正常。以下代码产生:

import numpy as np
import sklearn
import statsmodels
import sklearn.linear_model as sl
import statsmodels.api as sm

np.random.seed(42)

N=1000
X = np.random.normal(loc=1, size=(N, 1))
Y = 2 * X.flatten() + 4 + np.random.normal(size=N)

sklernIntercept=sl.LinearRegression(fit_intercept=True).fit(X, Y)
sklernNoIntercept=sl.LinearRegression(fit_intercept=False).fit(X, Y)
statsmodelsIntercept = sm.OLS(Y, sm.add_constant(X))
statsmodelsNoIntercept = sm.OLS(Y, X)

print(sklernIntercept.score(X, Y), statsmodelsIntercept.fit().rsquared)
print(sklernNoIntercept.score(X, Y), statsmodelsNoIntercept.fit().rsquared)

print(sklearn.__version__, statsmodels.__version__)

打印:

0.78741906105 0.78741906105
-0.950825182861 0.783154483028
0.19.1 0.8.0

差异从何而来?

问题不同于Different Linear Regression Coefficients with statsmodels and sklearn因为 sklearn.linear_model.LinearModel(带截距)适用于为 statsmodels.api.OLS 准备的 X。

问题不同于 Statsmodels: Calculate fitted values and R squared因为它解决了两个 Python 包(statsmodelsscikit-learn)之间的差异,而链接的问题是关于 statsmodels 和常见的 R^2 定义。他们都得到了相同的答案,但是这个问题已经在这里讨论过了:Does the same answer imply that the questions should be closed as duplicate?

最佳答案

正如@user333700 在评论中指出的那样,statsmodels 实现中的 R^2 的 OLS 定义与 scikit-learn 中的不同。

来自 documentation of RegressionResults class (强调我的):

rsquared

R-squared of a model with an intercept. This is defined here as 1 - ssr/centered_tss if the constant is included in the model and 1 - ssr/uncentered_tss if the constant is omitted.

来自 documentation of LinearRegression.score() :

score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual

sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

关于python - 为什么 OLS 回归的 `sklearn` 和 `statsmodels` 实现给出不同的 R^2?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48832925/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com