gpt4 book ai didi

python - Statsmodels:很难将 ARIMA 预测与原始数据的置信界限叠加

转载 作者:行者123 更新时间:2023-12-01 09:12:10 30 4
gpt4 key购买 nike

我有一些股票数据,通过下载并制作成 pandas 系列

import quandl as qd
api = '1uRGReHyAEgwYbzkPyG3'
qd.ApiConfig.api_key = api

data = qd.get_table('WIKI/PRICES', qopts={'columns': ['ticker', 'date', 'high', 'low', 'open', 'close']},
ticker=['AMZN'], date={'gte': '2000-01-01', 'lte': '2014-03-10'})

data.reset_index(inplace=True, drop=True)

price = pd.Series(data.iloc[:,2].values,index=pd.to_datetime(data.iloc[:,1]))

通过 statsmodels,我想绘制一个 ARIMA 模型,显示以下内容:

  1. 原始数据,
  2. 拟合值与某些原始数据重叠,并且
  3. future 预测 + 指定距离内的置信区间。

enter image description here

上图来自statsmodels文档here ,但是遵循他们的代码会引发奇怪的错误。

fig, ax = plt.subplots()
ax = price.loc['2012-01-03':].plot(ax=ax, label='observed')

fig = model_fit.plot_predict('2014-01-03','2015-01-03', dynamic=False, ax=ax, plot_insample=False)

plt.show()

以下错误,

KeyError                                  Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1420243200000000000

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

KeyError: Timestamp('2015-01-03 00:00:00')

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1420243200000000000

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py in _get_predict_end(self, end)
172 try:
--> 173 end = self._get_dates_loc(dates, dtend)
174 except KeyError as err: # end is greater than dates[-1]...probably

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py in _get_dates_loc(self, dates, date)
94 def _get_dates_loc(self, dates, date):
---> 95 date = dates.get_loc(date)
96 return date

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
1425 key = Timestamp(key, tz=self.tz)
-> 1426 return Index.get_loc(self, key, method, tolerance)
1427

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

KeyError: Timestamp('2015-01-03 00:00:00')

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
<ipython-input-206-505c74789333> in <module>()
3 ax = price.loc['2012-01-03':].plot(ax=ax, label='observed')
4
----> 5 fig = model_fit.plot_predict('2014-01-03','2015-01-03', dynamic=False, ax=ax, plot_insample=False)
6
7 plt.show()

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in plot_predict(self, start, end, exog, dynamic, alpha, plot_insample, ax)
1885
1886 # use predict so you set dates
-> 1887 forecast = self.predict(start, end, exog, 'levels', dynamic)
1888 # doing this twice. just add a plot keyword to predict?
1889 start = self.model._get_predict_start(start, dynamic=dynamic)

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in predict(self, start, end, exog, typ, dynamic)
1808 def predict(self, start=None, end=None, exog=None, typ='linear',
1809 dynamic=False):
-> 1810 return self.model.predict(self.params, start, end, exog, typ, dynamic)
1811 predict.__doc__ = _arima_results_predict
1812

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in predict(self, params, start, end, exog, typ, dynamic)
1184 if not dynamic:
1185 predict = super(ARIMA, self).predict(params, start, end, exog,
-> 1186 dynamic)
1187
1188 start = self._get_predict_start(start, dynamic)

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in predict(self, params, start, end, exog, dynamic)
732 # will return an index of a date
733 start = self._get_predict_start(start, dynamic)
--> 734 end, out_of_sample = self._get_predict_end(end, dynamic)
735 if out_of_sample and (exog is None and self.k_exog > 0):
736 raise ValueError("You must provide exog for ARMAX")

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in _get_predict_end(self, end, dynamic)
1062 Handling of inclusiveness should be done in the predict function.
1063 """
-> 1064 end, out_of_sample = super(ARIMA, self)._get_predict_end(end, dynamic)
1065 if 'mle' not in self.method and not dynamic:
1066 end -= self.k_ar

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in _get_predict_end(self, end, dynamic)
673 def _get_predict_end(self, end, dynamic=False):
674 # pass through so predict works for ARIMA and ARMA
--> 675 return super(ARMA, self)._get_predict_end(end)
676
677 def geterrors(self, params):

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py in _get_predict_end(self, end)
177 freq = self.data.freq
178 out_of_sample = datetools._idx_from_dates(dates[-1], dtend,
--> 179 freq)
180 else:
181 if freq is None:

~/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/datetools.py in _idx_from_dates(d1, d2, freq)
100 return len(DatetimeIndex(start=_maybe_convert_period(d1),
101 end=_maybe_convert_period(d2),
--> 102 freq=_freq_to_pandas[freq])) - 1
103
104

~/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
116 else:
117 kwargs[new_arg_name] = new_arg_value
--> 118 return func(*args, **kwargs)
119 return wrapper
120 return _deprecate_kwarg

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
303
304 if data is None and freq is None:
--> 305 raise ValueError("Must provide freq argument if no data is "
306 "supplied")
307

ValueError: Must provide freq argument if no data is supplied

我做错了什么?

更新

根据 Chad Fulton 的建议,我尝试 A) 以预先指定的频率下载数据,B) 下载后手动更改原始数据的频率,C) 将 statsmodels 更新为 0.9 并重试上述所有操作。

A 给我错误“从传递的日期推断的频率 None 不符合传递的频率 D”,而 B 在数据中产生 NaN,导致模型本身无法运行,并且C 更改 B 的错误类型。

我认为发生的情况是,由于无法将频率应用于数据,因此预测不应因不知道如何生成 future 日期而受到指责。在这种情况下,是否有人对在进行基本预测时如何利用尽可能多的金融时间序列数据有任何实用的建议,至少对于自动处理丢失数据的非状态空间模型?

最佳答案

我的第一个答案可能不太令人满意,但从长远来看可能更好,是建议您升级到 Statsmodels 0.9,它对日期/时间处理进行了彻底修改。这很可能解决您的问题。

我的第二个答案是,您可以通过确保日期索引具有频率来解决 Statsmodels < 0.9 的问题。看来您的日期可能是每天的(如果不是,您必须更改以下内容才能使用正确的 freq 参数),因此我建议您替换:


价格 = pd.Series(data.iloc[:,2].values,index=pd.to_datetime(data.iloc[:,1]))

与:


价格 = pd.Series(data.iloc[:,2].values, index=pd.DatetimeIndex(data.iloc[:,1], freq='D'))

关于python - Statsmodels:很难将 ARIMA 预测与原始数据的置信界限叠加,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51567645/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com