python - 设置 Statsmodels 线性回归的数据格式-6ren

python - 设置 Statsmodels 线性回归的数据格式

转载作者：行者123 更新时间：2023-12-01 02:27:26

25

4

我正在尝试使用 Python 中的 Statsmodels 进行一些多元线性回归，但在组织数据时我遇到了一些心理障碍。

默认的波士顿数据集如下所示:

线性回归模型的输出是这样的:

我的原始数据是用空格分隔的，如下所示:

我已经能够将它排列到这里的数组中:

有更多 Python 经验的人知道如何以与波士顿数据集类似的方式格式化我的数据，以便我可以轻松地执行我的回归模型吗？例如，设置与我的数据索引相对应的 feature_names。

以下是我的原始数据的前几行供引用:

cycles         instructions   cache-references  cache-misses  branches     branch-misses  page-faults  Power
62,206,703     32,245,343     611,044           95,558        5,641,681    222,594        421          6.6
77,401,927     61,320,289     822,194           98,898        10,910,837   595,585        1,392        6.1
344,672,658    271,884,884    5,371,884         1,253,294     49,628,843   2,782,476      5,392        7.6
231,536,106    173,069,386    3,239,546         325,881       31,584,329   1,777,599      4,372        7.0
212,658,828    152,965,489    3,100,104         251,128       28,182,710   1,588,984      4,285        6.8
1,222,008,914  1,254,822,100  21,562,804        647,512       228,200,750  8,455,056      5,044        15.6
932,484,581    1,132,190,670  8,591,598         507,549       196,773,155  7,610,639      7,147        12.5
241,069,403    148,143,290    3,745,890         320,577       27,384,544   1,614,852      4,325        7.4
253,961,868    195,947,891    3,399,113         331,988       36,069,348   1,980,045      4,322        7.7
142,030,480    91,300,650     2,026,211         242,980       17,269,376   1,010,190      3,651        6.5
90,317,329     51,421,629     1,309,714         146,585       9,332,184    492,279        1,511        6.2
293,537,472    224,121,684    3,964,357         379,418       41,137,776   1,981,583      3,386        7.9

谢谢

最佳答案

我会使用 pandas 将数据读入内存，否则只需按照您在波士顿房价中找到的示例即可:

import pandas as pd
import statsmodels.api as sm

df = pd.read_csv('data.txt', sep='\s+', thousands=',')
X = df.loc[:, 'cycles':'page-faults']
y = df['Power']
model = sm.OLS(y, X).fit()

在这种情况下，model.summary() 变为

OLS Regression Results                            
==============================================================================
Dep. Variable:                  Power   R-squared:                       0.972
Model:                            OLS   Adj. R-squared:                  0.932
Method:                 Least Squares   F-statistic:                     24.56
Date:                Fri, 10 Nov 2017   Prob (F-statistic):            0.00139
Time:                        22:09:47   Log-Likelihood:                -21.470
No. Observations:                  12   AIC:                             56.94
Df Residuals:                       5   BIC:                             60.33
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
====================================================================================
                       coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------
cycles            1.287e-07   5.11e-08      2.518      0.053   -2.66e-09     2.6e-07
instructions     -7.083e-09   4.21e-07     -0.017      0.987   -1.09e-06    1.07e-06
cache-references -1.625e-06   2.48e-06     -0.656      0.541   -7.99e-06    4.74e-06
cache-misses      3.222e-06   5.24e-06      0.615      0.566   -1.03e-05    1.67e-05
branches          1.281e-07    2.6e-06      0.049      0.963   -6.55e-06    6.81e-06
branch-misses    -1.625e-05    1.2e-05     -1.357      0.233    -4.7e-05    1.45e-05
page-faults          0.0016      0.002      0.924      0.398      -0.003       0.006
==============================================================================
Omnibus:                        2.485   Durbin-Watson:                   1.641
Prob(Omnibus):                  0.289   Jarque-Bera (JB):                0.787
Skew:                           0.606   Prob(JB):                        0.675
Kurtosis:                       3.326   Cond. No.                     1.92e+06
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.92e+06. This might indicate that there are
strong multicollinearity or other numerical problems.'

关于python - 设置 Statsmodels 线性回归的数据格式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47230345/

25

4

0

文章推荐： javascript - React Immutable JS 如何订购 map

文章推荐： javascript - 检查事件轮播项目是否具有值为 1 的属性

文章推荐： javascript - Moment.js 无法正确解析 JavaScript 日期

statsmodels - 如何使用 statsmodels 时间序列模型获取预测区间？
是否有 statsmodels API 可以从 statsmodels 时间序列模型中检索预测区间？目前，我正在使用以下方法手动计算预测区间: 这是我的代码。首先，获取一些示例数据... ! pyt
python - Pandas 寻找 scikits.statsmodels 而不是 statsmodels
我有:statsmodels 0.5.0(正式名称为 scikits.statsmodels) Pandas 0.12.0。(全部从源安装) 我收到这个错误: File "/home/userna
python - Statsmodel Z 测试未按预期工作(statsmodels.stats.weightstats.CompareMeans.ztest_ind)
所有内容的格式都与 Statsmodels 网站上的一样，但是 Spyder 以某种方式返回了这个: TypeError: ztest_ind() got multiple values for ar
python - Statsmodel Z 测试未按预期工作(statsmodels.stats.weightstats.CompareMeans.ztest_ind)
所有内容的格式都与 Statsmodels 网站上的一样，但是 Spyder 以某种方式返回了这个: TypeError: ztest_ind() got multiple values for ar
python - 使用 statsmodel.formula.api 与 statsmodel.api 的 OLS
谁能给我解释一下 statsmodel.formula.api 中的 ols 和 statsmodel.api 中的 ols 之间的区别？使用 ISLR 文本中的广告数据，我使用两者运行了一个 ol
python - statsmodels.api.sm.OLS 和 statsmodels.formula.api.ols 有什么区别
我用python处理一个线性回归模型，json数据如下: {"Y":[1,2,3,4,5],"X":[[1,43,23],[2,3,43],[3,23,334],[4,43,23],[232,234,
python - Statsmodels 混合线性模型预测
我正在使用 Python 中的 statsmodels MixedLM 包估计一个混合线性模型。拟合模型后，我现在想进行预测，但很难理解“预测”方法。 statsmodels 文档 (http://w
python - Statsmodels:编写公式的简短方法
使用状态模型的逻辑回归模型: log_reg = st.logit(formula = 'label ~ pregnant + glucose + bp + insulin + bmi + pedig
python - statsmodels 二维核回归
我有一个包含 3 列的数据框 ['X', 'Y', 'Z'] 我想研究一下 X 和 Y影响Z的分布。为此，我想使用 nadaraya watson 的非参数回归器。在 statsmodels 中有一个
python - Statsmodels 无法导入模块
我正在尝试使用 statsmodel 中的 statsmodels.discrete.conditional_models.ConditionalLogit 类。在jupyter笔记本中导入模块时，
Python:不工作 StatsModels
我安装 statsmodels: apt-get install python python-dev python-setuptools python-numpy python-scipy curl
python - statsmodels 示例似乎不起作用
import statsmodels.formula.api as sm import numpy as np import pandas url = "http://vincentarelbundo
Python statsmodels 返回值缺失
我正在尝试在 x-y 数据的简单测试集上使用 statsmodels 中的稳健线性模型。然而，作为 model.params 的返回值，我只得到一个值。如何获得拟合的斜率和截距？最小示例(其中我试图从
Python statsmodels 格兰杰因果关系测试返回空字典
我正在使用库statsmodels.tsa.stattools.grangercausalitytests来测试两个时间序列之间的相似性。我知道列表 a 和 b 都是合法列表，没有任何 None 或
python - statsmodels - 绘制拟合分布
以下代码使用 statsmodels 拟合了一个过度简化的广义线性模型 model = smf.glm('Y ~ 1', family=sm.families.NegativeBinomial(),
python - statsmodels:一起打印多个回归模型的摘要
在Python库Statsmodels中，可以用print(results.summary())打印出回归结果，如何打印出超过的摘要一张表中的一个回归，以便更好地比较？线性回归，代码取自 stats
python - statsmodels 无法使用诸如登录异构类型行之类的函数来预测公式
我有一个 pandas DataFrame，其行包含多种类型的数据。我想使用 statsmodels.formula.api 根据这些数据拟合一个模型，然后做出一些预测。对于我的应用程序，我想一次预测
python - Statsmodels - 广播形状不同？
我正在尝试使用 statsmodels 中的 logit 模块对包含 bool 值(“默认”)目标变量和两个特征(“fico_interp”、“home_ownership_int”)的数据集执行逻辑
python - Statsmodels 基于异方差一致性标准误差绘制平均置信区间
这个问题类似于confidence and prediction intervals with StatsModels但有一个额外的细微差别: 我的数据是异方差的，我想使用 statsmodels 提
Python StatsModels 时间序列分解重复图
我正在使用 Pandas 的混合物和 StatsModels绘制时间序列分解图。我关注了this answer但是当我调用 plot() 时，它似乎在绘制一个副本。我的 DataFrame 看起来像

首页

博学

6Ren·AI

商城

python - 设置 Statsmodels 线性回归的数据格式