gpt4 book ai didi

matplotlib - 如何计算回归预测的置信区间?以及如何在 python 中绘制它

转载 作者:行者123 更新时间:2023-12-04 12:37:17 25 4
gpt4 key购买 nike

enter image description here

图 7.1,统计学习简介


我目前正在学习一本名为 Introduction to Statistical Learning with applications in R 的书,并将解决方案转换为 python 语言。
我无法获得如何获得置信区间并绘制它们,如上图(虚线)所示。我画了线。这是我的代码 -(我正在使用多项式回归与预测变量 - '年龄' 和响应 - '工资',度数为 4)

poly = PolynomialFeatures(4)
X = poly.fit_transform(data['age'].to_frame())
y = data['wage']
# X.shape

model = sm.OLS(y,X).fit()
print(model.summary())

# So, what we want here is not only the final line, but also the standart error related to the line
# TO find that we need to calcualte the predictions for some values of age
test_ages = np.linspace(data['age'].min(),data['age'].max(),100)

X_test = poly.transform(test_ages.reshape(-1,1))
pred = model.predict(X_test)

plt.figure(figsize = (12,8))
plt.scatter(data['age'],data['wage'],facecolors='none', edgecolors='darkgray')
plt.plot(test_ages,pred)

这里的数据是 R 中可用的 WAGE 数据。这是我得到的结果图 -

This is what i was able to plot

最佳答案

我使用 Bootstrap 来计算置信区间,为此我使用了一个自定义模块 -

import numpy as np
import pandas as pd
from tqdm import tqdm

class Bootstrap_ci:


def boot(self,X_data,y_data,R,test_data,model):
predictions = []
for i in tqdm(range(R)):
predictions.append(self.alpha(X_data,y_data,self.get_indices(X_data,200),test_data,model))

return np.percentile(predictions,2.5,axis = 0),np.percentile(predictions,97.5,axis = 0)

def alpha(self,X_data,y_data,index,test_data,model):
X = X_data.loc[index]
y = y_data.loc[index]

lr = model
lr.fit(pd.DataFrame(X),y)

return lr.predict(pd.DataFrame(test_data))


def get_indices(self,data,num_samples):
return np.random.choice(data.index, num_samples, replace=True)

上面的模块可以用作-

poly = PolynomialFeatures(4)
X = poly.fit_transform(data['age'].to_frame())
y = data['wage']

X_test = np.linspace(min(data['age']),max(data['age']),100)
X_test_poly = poly.transform(X_test.reshape(-1,1))

from bootstrap import Bootstrap_ci

bootstrap = Bootstrap_ci()

li,ui = bootstrap.boot(pd.DataFrame(X),y,1000,X_test_poly,LinearRegression())

这将为我们提供较低的置信区间和较高的置信区间。绘制图表 -

plt.scatter(data['age'],data['wage'],facecolors='none', edgecolors='darkgray')
plt.plot(X_test,pred,label = 'Fitted Line')
plt.plot(X_test,ui,linestyle = 'dashed',color = 'r',label = 'Confidence Intervals')
plt.plot(X_test,li,linestyle = 'dashed',color = 'r')

结果图是

enter image description here

关于matplotlib - 如何计算回归预测的置信区间?以及如何在 python 中绘制它,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63165775/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com