gpt4 book ai didi

python - Pandas 统计模型中的多元线性回归 : ValueError

转载 作者:太空狗 更新时间:2023-10-29 22:15:44 25 4
gpt4 key购买 nike

数据:https://courses.edx.org/c4x/MITx/15.071x_2/asset/NBA_train.csv

我知道如何使用 statsmodels.formula.api 将这些数据拟合到多元线性回归模型中:

import pandas as pd
NBA = pd.read_csv("NBA_train.csv")
import statsmodels.formula.api as smf
model = smf.ols(formula="W ~ PTS + oppPTS", data=NBA).fit()
model.summary()

但是,我发现这种类似 R 的公式表示法很笨拙,我想使用通常的 pandas 语法:

import pandas as pd
NBA = pd.read_csv("NBA_train.csv")
import statsmodels.api as sm
X = NBA['W']
y = NBA[['PTS', 'oppPTS']]
X = sm.add_constant(X)
model11 = sm.OLS(y, X).fit()
model11.summary()

使用第二种方法我得到以下错误:

ValueError: shapes (835,2) and (835,2) not aligned: 2 (dim 1) != 835 (dim 0)

为什么会发生以及如何解决?

最佳答案

当使用sm.OLS(y, X)时,y是因变量,X是因变量自变量。

在公式W ~ PTS + oppPTS中,W是因变量,PTSoppPTS是自变量。

因此,使用

y = NBA['W']
X = NBA[['PTS', 'oppPTS']]

代替

X = NBA['W']
y = NBA[['PTS', 'oppPTS']]

import pandas as pd
import statsmodels.api as sm

NBA = pd.read_csv("NBA_train.csv")
y = NBA['W']
X = NBA[['PTS', 'oppPTS']]
X = sm.add_constant(X)
model11 = sm.OLS(y, X).fit()
model11.summary()

产量

                            OLS Regression Results                            
==============================================================================
Dep. Variable: W R-squared: 0.942
Model: OLS Adj. R-squared: 0.942
Method: Least Squares F-statistic: 6799.
Date: Sat, 21 Mar 2015 Prob (F-statistic): 0.00
Time: 14:58:05 Log-Likelihood: -2118.0
No. Observations: 835 AIC: 4242.
Df Residuals: 832 BIC: 4256.
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 41.3048 1.610 25.652 0.000 38.144 44.465
PTS 0.0326 0.000 109.600 0.000 0.032 0.033
oppPTS -0.0326 0.000 -110.951 0.000 -0.033 -0.032
==============================================================================
Omnibus: 1.026 Durbin-Watson: 2.238
Prob(Omnibus): 0.599 Jarque-Bera (JB): 0.984
Skew: 0.084 Prob(JB): 0.612
Kurtosis: 3.009 Cond. No. 1.80e+05
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.8e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

关于python - Pandas 统计模型中的多元线性回归 : ValueError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29186436/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com