python - 如何绘制 statsmodels 拟合的置信区间？-6ren

python - 如何绘制 statsmodels 拟合的置信区间？

转载作者：行者123 更新时间：2023-12-04 03:48:05

28

4

我想在我为数据的三次样条绘制的图上显示置信区间，但我不知道应该如何完成。从理论上讲，我知道当我们接近边缘时 CI 应该偏离拟合线，但我想出的唯一解决方案是这个 janky 加法，它没有显示正确的 CI。
这是代码:

import pandas as pd
from patsy import dmatrix
import statsmodels.api as sm
import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(7,5))
df = pd.read_csv('http://web.stanford.edu/~oleg2/hse/wage/wage.csv').sort_values(by=['age'])
ind_df = df[['wage', 'age']].copy()

def get_bse(bse, k, m, ma, x):
  prev, ans = m, []
  k.append(ma+1)
  for i, k_ in enumerate(k):
    ans += [bse[i]]*np.sum( ((x >= prev) & (x < k_)) ); prev = k_
  return np.array(ans)


plt.scatter(df.age, df.wage, color='none', edgecolor='silver', s=10)
plt.xlabel('Age', fontsize=15)
plt.ylabel('Wage', fontsize=15)
plt.ylim((0,333))

d = 4
knots = [df.age.quantile(0.25), df.age.quantile(0.5), df.age.quantile(0.75)]

my_spline_transformation = f"bs(train, knots={knots}, degree={d}, include_intercept=True)"

transformed = dmatrix( my_spline_transformation, {"train": df.age}, return_type='dataframe' )

ft = sm.GLS(df.wage, transformed).fit()

lft = sm.Logit( (df.age > 250), transformed )
y_grid1 = lft.predict(transformed.transpose())
y_grid = ft.predict(transformed)
plt.plot(df.age, y_grid, color='crimson', linewidth=2)
plt.plot(df.age, y_grid + get_bse(ft.bse, knots, df.age.min(), df.age.max(), df.age), color='crimson', linewidth=2, linestyle='--')
plt.plot(df.age, y_grid - get_bse(ft.bse, knots, df.age.min(), df.age.max(), df.age), color='crimson', linewidth=2, linestyle='--')

plt.show()

请注意，该图应该是具有四个自由度的自然三次样条函数，但我不确定我的解决方案是否正确。实现它的正确方法是什么？

最佳答案

如果您使用:

predictions = ft.get_prediction()
df_predictions = predictions.summary_frame()
df_predictions.index = df.age.values

您将拥有一个带有 CI 拟合结果的 DataFrame。然后，将绘图更改为:

y_grid1 = lft.predict(transformed.transpose())
y_grid = ft.predict(transformed)
plt.plot(df.age, y_grid, color='crimson', linewidth=2)
#plt.plot(df.age, y_grid + get_bse(ft.bse, knots, df.age.min(), df.age.max(), df.age), color='crimson', linewidth=2, linestyle='--')
#plt.plot(df.age, y_grid - get_bse(ft.bse, knots, df.age.min(), df.age.max(), df.age), color='crimson', linewidth=2, linestyle='--')
predictions = ft.get_prediction()
df_predictions = predictions.summary_frame()
df_predictions.index = df.age.values
plt.plot(df_predictions['mean'], color='crimson')
plt.fill_between(df_predictions.index, df_predictions.mean_ci_lower, df_predictions.mean_ci_upper, alpha=.1, color='crimson')
plt.fill_between(df_predictions.index, df_predictions.obs_ci_lower, df_predictions.obs_ci_upper, alpha=.1, color='crimson')
plt.show()

产量:

其中深红色区域是平均 CI，浅红色区域是数据集中观测值的 CI。

关于python - 如何绘制 statsmodels 拟合的置信区间？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64833185/

28

4

0

文章推荐： bash - -exec wc -l {}\;打印计数和路径，我只需要计数

文章推荐： visual-studio - Visual Studio 在线本地备份？

文章推荐： regex - VB.NET 中带拆分的正则表达式

statsmodels - 如何使用 statsmodels 时间序列模型获取预测区间？
是否有 statsmodels API 可以从 statsmodels 时间序列模型中检索预测区间？目前，我正在使用以下方法手动计算预测区间: 这是我的代码。首先，获取一些示例数据... ! pyt
python - Pandas 寻找 scikits.statsmodels 而不是 statsmodels
我有:statsmodels 0.5.0(正式名称为 scikits.statsmodels) Pandas 0.12.0。(全部从源安装) 我收到这个错误: File "/home/userna
python - Statsmodel Z 测试未按预期工作(statsmodels.stats.weightstats.CompareMeans.ztest_ind)
所有内容的格式都与 Statsmodels 网站上的一样，但是 Spyder 以某种方式返回了这个: TypeError: ztest_ind() got multiple values for ar
python - Statsmodel Z 测试未按预期工作(statsmodels.stats.weightstats.CompareMeans.ztest_ind)
所有内容的格式都与 Statsmodels 网站上的一样，但是 Spyder 以某种方式返回了这个: TypeError: ztest_ind() got multiple values for ar
python - 使用 statsmodel.formula.api 与 statsmodel.api 的 OLS
谁能给我解释一下 statsmodel.formula.api 中的 ols 和 statsmodel.api 中的 ols 之间的区别？使用 ISLR 文本中的广告数据，我使用两者运行了一个 ol
python - statsmodels.api.sm.OLS 和 statsmodels.formula.api.ols 有什么区别
我用python处理一个线性回归模型，json数据如下: {"Y":[1,2,3,4,5],"X":[[1,43,23],[2,3,43],[3,23,334],[4,43,23],[232,234,
python - Statsmodels 混合线性模型预测
我正在使用 Python 中的 statsmodels MixedLM 包估计一个混合线性模型。拟合模型后，我现在想进行预测，但很难理解“预测”方法。 statsmodels 文档 (http://w
python - Statsmodels:编写公式的简短方法
使用状态模型的逻辑回归模型: log_reg = st.logit(formula = 'label ~ pregnant + glucose + bp + insulin + bmi + pedig
python - statsmodels 二维核回归
我有一个包含 3 列的数据框 ['X', 'Y', 'Z'] 我想研究一下 X 和 Y影响Z的分布。为此，我想使用 nadaraya watson 的非参数回归器。在 statsmodels 中有一个
python - Statsmodels 无法导入模块
我正在尝试使用 statsmodel 中的 statsmodels.discrete.conditional_models.ConditionalLogit 类。在jupyter笔记本中导入模块时，
Python:不工作 StatsModels
我安装 statsmodels: apt-get install python python-dev python-setuptools python-numpy python-scipy curl
python - statsmodels 示例似乎不起作用
import statsmodels.formula.api as sm import numpy as np import pandas url = "http://vincentarelbundo
Python statsmodels 返回值缺失
我正在尝试在 x-y 数据的简单测试集上使用 statsmodels 中的稳健线性模型。然而，作为 model.params 的返回值，我只得到一个值。如何获得拟合的斜率和截距？最小示例(其中我试图从
Python statsmodels 格兰杰因果关系测试返回空字典
我正在使用库statsmodels.tsa.stattools.grangercausalitytests来测试两个时间序列之间的相似性。我知道列表 a 和 b 都是合法列表，没有任何 None 或
python - statsmodels - 绘制拟合分布
以下代码使用 statsmodels 拟合了一个过度简化的广义线性模型 model = smf.glm('Y ~ 1', family=sm.families.NegativeBinomial(),
python - statsmodels:一起打印多个回归模型的摘要
在Python库Statsmodels中，可以用print(results.summary())打印出回归结果，如何打印出超过的摘要一张表中的一个回归，以便更好地比较？线性回归，代码取自 stats
python - statsmodels 无法使用诸如登录异构类型行之类的函数来预测公式
我有一个 pandas DataFrame，其行包含多种类型的数据。我想使用 statsmodels.formula.api 根据这些数据拟合一个模型，然后做出一些预测。对于我的应用程序，我想一次预测
python - Statsmodels - 广播形状不同？
我正在尝试使用 statsmodels 中的 logit 模块对包含 bool 值(“默认”)目标变量和两个特征(“fico_interp”、“home_ownership_int”)的数据集执行逻辑
python - Statsmodels 基于异方差一致性标准误差绘制平均置信区间
这个问题类似于confidence and prediction intervals with StatsModels但有一个额外的细微差别: 我的数据是异方差的，我想使用 statsmodels 提
Python StatsModels 时间序列分解重复图
我正在使用 Pandas 的混合物和 StatsModels绘制时间序列分解图。我关注了this answer但是当我调用 plot() 时，它似乎在绘制一个副本。我的 DataFrame 看起来像

首页

博学

6Ren·AI

商城

python - 如何绘制 statsmodels 拟合的置信区间？