gpt4 book ai didi

python - statsmodels.api.sm.OLS 和 statsmodels.formula.api.ols 有什么区别

转载 作者:行者123 更新时间:2023-12-03 19:25:45 27 4
gpt4 key购买 nike

我用python处理一个线性回归模型,json数据如下:

{"Y":[1,2,3,4,5],"X":[[1,43,23],[2,3,43],[3,23,334],[4,43,23],[232,234,24]]}

我使用的是statsmodels.api.sm.OLS().fit和statsmodels.formula.api.ols.fit(),我认为它们是相同的模型,但结果不同。

这是第一个函数:

import statsmodels.api as sm
def analyze1():
print 'using sm.OLS().fit'
data = json.load(open(FNAME_DATA))
X = np.asarray(data['X'])
Y = np.log(np.asarray(data['Y']) + 1)
X2 = sm.add_constant(X)
results = sm.OLS(Y, X2).fit()
print results.summary()

这是第二个功能:

from statsmodels.formula.api import ols
def analyze2():
print 'using ols().fit'
data = json.load(open(FNAME_DATA))
results=ols('Y~X+1',data=data).fit()
print results.summary()

第一个函数输出:
using sm.OLS().fit
/home/aaron/anaconda2/lib/python2.7/site-packages/statsmodels/stats/stattools.py:72: ValueWarning: omni_normtest is not valid with less than 8 observations; 5 samples were given.
"samples were given." % int(n), ValueWarning)
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.449
Model: OLS Adj. R-squared: -1.204
Method: Least Squares F-statistic: 0.2717
Date: Wed, 07 Aug 2019 Prob (F-statistic): 0.849
Time: 07:17:00 Log-Likelihood: -0.87006
No. Observations: 5 AIC: 9.740
Df Residuals: 1 BIC: 8.178
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.0859 0.720 1.509 0.373 -8.057 10.228
x1 0.0024 0.018 0.134 0.915 -0.229 0.234
x2 0.0005 0.020 0.027 0.983 -0.256 0.257
x3 0.0008 0.003 0.332 0.796 -0.031 0.033
==============================================================================
Omnibus: nan Durbin-Watson: 1.485
Prob(Omnibus): nan Jarque-Bera (JB): 0.077
Skew: 0.175 Prob(JB): 0.962
Kurtosis: 2.503 Cond. No. 402.
==============================================================================

第二个函数输出:
using ols().fit
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.551
Model: OLS Adj. R-squared: -0.796
Method: Least Squares F-statistic: 0.4092
Date: Wed, 07 Aug 2019 Prob (F-statistic): 0.784
Time: 07:17:00 Log-Likelihood: -6.8251
No. Observations: 5 AIC: 21.65
Df Residuals: 1 BIC: 20.09
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.9591 2.368 0.827 0.560 -28.124 32.042
X[0] 0.0030 0.060 0.051 0.968 -0.757 0.764
X[1] 0.0098 0.066 0.148 0.906 -0.834 0.854
X[2] 0.0024 0.008 0.289 0.821 -0.103 0.108
==============================================================================
Omnibus: nan Durbin-Watson: 1.485
Prob(Omnibus): nan Jarque-Bera (JB): 0.077
Skew: 0.175 Prob(JB): 0.962
Kurtosis: 2.503 Cond. No. 402.
==============================================================================

我认为这些是相似的模型,但是使用相同的数据结果(coef)和对数似然不同,我不知道这两个模型是否有一些差异。

最佳答案

前者( OLS )是一个类。后者( ols )是 OLS 的一种方法从 statsmodels.base.model.Model 继承的类.

In [11]: from statsmodels.api import OLS                                           

In [12]: from statsmodels.formula.api import ols

In [13]: OLS
Out[13]: statsmodels.regression.linear_model.OLS

In [14]: ols
Out[14]: <bound method Model.from_formula of <class 'statsmodels.regression.linear_model.OLS'>>

根据我自己的测试,我相信模型应该产生相同的结果。然而, 在您的示例中,您将 log 应用于第一个模型中的 y ,但不在第二个模型中。 相同的字段仅从 X 计算,这在两个模型中都是相同的。不同的字段是 y 不同的结果。

由于我无权访问您的数据,请随意使用此独立示例作为完整性检查。这两个模型(看起来很垃圾)在我安装它们后产生了相同的摘要。

示例:
import pandas as pd
import statsmodels.api as sm
import numpy as np
from sklearn.datasets import load_diabetes
from statsmodels.formula.api import ols

X = pd.DataFrame(data=load_diabetes()['data'],
columns=load_diabetes()['feature_names'])
X.drop(['age', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'], axis=1, inplace=True)
X = sm.add_constant(X)
y = pd.DataFrame(data=load_diabetes()['target'], columns=['y'])

mod1 = sm.OLS(np.log(y), X)
results1 = mod1.fit()
print(results1.summary())

mod2 = ols('np.log(y) ~ sex + bmi', data=pd.concat([X, y], axis=1))
results2 = mod2.fit()
print(results2.summary())

输出 (OLS):
                            OLS Regression Results                            
==============================================================================
Dep. Variable: y R-squared: 0.297
Model: OLS Adj. R-squared: 0.294
Method: Least Squares F-statistic: 92.90
Date: Tue, 06 Aug 2019 Prob (F-statistic): 2.27e-34
Time: 21:06:21 Log-Likelihood: -291.29
No. Observations: 442 AIC: 588.6
Df Residuals: 439 BIC: 600.9
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.8813 0.022 218.671 0.000 4.837 4.925
sex -0.0868 0.471 -0.184 0.854 -1.013 0.839
bmi 6.4042 0.471 13.593 0.000 5.478 7.330
==============================================================================
Omnibus: 14.733 Durbin-Watson: 1.892
Prob(Omnibus): 0.001 Jarque-Bera (JB): 15.547
Skew: -0.446 Prob(JB): 0.000421
Kurtosis: 2.776 Cond. No. 22.0
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

输出(ols):
                            OLS Regression Results                            
==============================================================================
Dep. Variable: np.log(y) R-squared: 0.297
Model: OLS Adj. R-squared: 0.294
Method: Least Squares F-statistic: 92.90
Date: Wed, 27 May 2020 Prob (F-statistic): 2.27e-34
Time: 01:42:40 Log-Likelihood: -291.29
No. Observations: 442 AIC: 588.6
Df Residuals: 439 BIC: 600.9
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.8813 0.022 218.671 0.000 4.837 4.925
sex -0.0868 0.471 -0.184 0.854 -1.013 0.839
bmi 6.4042 0.471 13.593 0.000 5.478 7.330
==============================================================================
Omnibus: 14.733 Durbin-Watson: 1.892
Prob(Omnibus): 0.001 Jarque-Bera (JB): 15.547
Skew: -0.446 Prob(JB): 0.000421
Kurtosis: 2.776 Cond. No. 22.0
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

关于python - statsmodels.api.sm.OLS 和 statsmodels.formula.api.ols 有什么区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57385279/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com