gpt4 book ai didi

python - 在 Pandas 中使用 .asof 和 MultiIndex

转载 作者:太空宇宙 更新时间:2023-11-03 16:05:06 25 4
gpt4 key购买 nike

我见过这个问题被问过几次,但没有答案。简短版本:

我有一个带有两级MultiIndex索引的pandas DataFrame;两个级别都是整数。如何在此 DataFrame 上使用 .asof()

长版:

我有一个带有一些时间序列数据的DataFrame:

>>> df
A
2016-01-01 00:00:00 1.560878
2016-01-01 01:00:00 -1.029380
... ...
2016-01-30 20:00:00 0.429422
2016-01-30 21:00:00 -0.182349
2016-01-30 22:00:00 -0.939461
2016-01-30 23:00:00 0.009930
2016-01-31 00:00:00 -0.854283

[721 rows x 1 columns]

然后我将构建该数据的每周模型:

>>> df['weekday'] = df.index.weekday
>>> df['hour_of_day'] = df.index.hour
>>> weekly_model = df.groupby(['weekday', 'hour_of_day']).mean()
>>> weekly_model
A
weekday hour_of_day
0 0 0.260597
1 0.333094
... ...
20 0.388932
21 -0.082020
22 -0.346888
23 1.525928
[168 rows x 1 columns]

这就是给我一个带有上述索引的 DataFrame 的原因。

我现在正在尝试将该模型推断为年度时间序列:

>>> dates = pd.date_range('2015/1/1', '2015/12/31 23:59', freq='H')
>>> annual_series = weekly
weekly weekly_model
>>> annual_series = weekly_model.A.asof((dates.weekday, dates.hour))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/core/series.py", line 2657, in asof
locs = self.index.asof_locs(where, notnull(values))
File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/indexes/base.py", line 1553, in asof_locs
locs = self.values[mask].searchsorted(where.values, side='right')
ValueError: operands could not be broadcast together with shapes (8760,) (2,)
>>> dates = pd.date_range('2015/1/1', '2015/12/31 23:59', freq='H')
>>> annual_series = weekly_model.A.asof((dates.weekday, dates.hour))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/core/series.py", line 2657, in asof
locs = self.index.asof_locs(where, notnull(values))
File "/home/tkcook/azimuth-web/lib/python3.5/site-packages/pandas/indexes/base.py", line 1553, in asof_locs
locs = self.values[mask].searchsorted(where.values, side='right')
ValueError: operands could not be broadcast together with shapes (8760,) (2,)

此错误意味着什么?执行此操作的最佳方法是什么?

到目前为止我想出的最好的办法是:

>>> annual_series = weekly_model.A.loc[list(zip(dates.weekday, dates.hour))]

它可以工作,但这意味着首先将 zip 迭代器转换为列表,这并不完全是内存友好的。有办法避免这种情况吗?

最佳答案

我多次阅读了您的帖子,我想我终于明白了您想要实现的目标。

试试这个:

df['weekday'] = df.index.weekday
df['hour_of_day'] = df.index.hour
weekly_model = df.groupby(['weekday', 'hour_of_day']).mean()
dates = pd.date_range('2015/1/1', '2015/12/31 23:59', freq='H')

然后像这样使用合并:

annual_series = pd.merge(df.reset_index(), weekly_model.reset_index(), on=['weekday', 'hour_of_day']).set_index('date')

现在你可以使用 asof 因为你有日期作为索引

annual_series.asof(dates)

这就是您要找的吗?

关于python - 在 Pandas 中使用 .asof 和 MultiIndex,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39922050/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com