gpt4 book ai didi

python - 向 Pandas 添加趋势线

转载 作者:太空狗 更新时间:2023-10-29 22:05:29 24 4
gpt4 key购买 nike

我有时间序列数据,如下:

                  emplvl
date
2003-01-01 10955.000000
2003-04-01 11090.333333
2003-07-01 11157.000000
2003-10-01 11335.666667
2004-01-01 11045.000000
2004-04-01 11175.666667
2004-07-01 11135.666667
2004-10-01 11480.333333
2005-01-01 11441.000000
2005-04-01 11531.000000
2005-07-01 11320.000000
2005-10-01 11516.666667
2006-01-01 11291.000000
2006-04-01 11223.000000
2006-07-01 11230.000000
2006-10-01 11293.000000
2007-01-01 11126.666667
2007-04-01 11383.666667
2007-07-01 11535.666667
2007-10-01 11567.333333
2008-01-01 11226.666667
2008-04-01 11342.000000
2008-07-01 11201.666667
2008-10-01 11321.000000
2009-01-01 11082.333333
2009-04-01 11099.000000
2009-07-01 10905.666667

time series graph

我想以最简单的方式在这张图上添加一个线性趋势(带截距)。此外,我想仅以 2006 年之前的数据为条件来计算此趋势。

我在这里找到了一些答案,但它们都包含 statsmodels。首先,这些答案可能不是最新的:pandas 已改进,现在本身包含一个 OLS 组件。其次,statsmodels 似乎估计每个时间段的单个固定效应,而不是线性趋势。我想我可以重新计算一个运行季度变量,但大多数情况下有更舒适的方法吗?

                            OLS Regression Results                            
==============================================================================
Dep. Variable: emplvl R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: 0.000
Date: tor, 14 apr 2016 Prob (F-statistic): nan
Time: 17:17:43 Log-Likelihood: 929.85
No. Observations: 40 AIC: -1780.
Df Residuals: 0 BIC: -1712.
Df Model: 39
Covariance Type: nonrobust
============================================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------------------------------------
Intercept 1.095e+04 inf 0 nan nan nan
date[T.Timestamp('2003-04-01 00:00:00')] 135.3333 inf 0 nan nan nan
date[T.Timestamp('2003-07-01 00:00:00')] 202.0000 inf 0 nan nan nan
date[T.Timestamp('2003-10-01 00:00:00')] 380.6667 inf 0 nan nan nan
date[T.Timestamp('2004-01-01 00:00:00')] 90.0000 inf 0 nan nan nan
date[T.Timestamp('2004-04-01 00:00:00')] 220.6667 inf 0 nan nan nan

如何以最简单的方式估计此趋势并将预测值作为一列添加到我的数据框中?

最佳答案

这是一个关于如何使用 pandas.ols 执行此操作的简单示例:

import matplotlib.pyplot as plt
import pandas as pd

x = pd.Series(np.arange(50))
y = pd.Series(10 + (2 * x + np.random.randint(-5, + 5, 50)))
regression = pd.ols(y=y, x=x)
regression.summary

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations: 50
Number of Degrees of Freedom: 2

R-squared: 0.9913
Adj R-squared: 0.9911

Rmse: 2.7625

F-stat (1, 48): 5465.1446, p-value: 0.0000

Degrees of Freedom: model 1, resid 48

-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 2.0013 0.0271 73.93 0.0000 1.9483 2.0544
intercept 9.5271 0.7698 12.38 0.0000 8.0183 11.0358
---------------------------------End of Summary---------------------------------

trend = regression.predict(beta=regression.beta, x=x[20:]) # slicing to only use last 30 points
data = pd.DataFrame(index=x, data={'y': y, 'trend': trend})
data.plot() # add kwargs for title and other layout/design aspects
plt.show() # or plt.gcf().savefig(path)

enter image description here

关于python - 向 Pandas 添加趋势线,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36627442/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com