gpt4 book ai didi

python - 你如何清理和转发用 Pandas 填充多天的 1 分钟时间序列?

转载 作者:太空宇宙 更新时间:2023-11-03 14:26:48 25 4
gpt4 key购买 nike

我有一个 csv 文件,其中包含跨越多天的 1 分钟股票数据。每天从 9:30 到 16:00。

时间序列中的某些分钟缺失:(此处缺少 2013-09-16 09:32:00 和 2013-09-17 09:31:00)

2013-09-16 09:30:00,461.01,461.49,461,461,183507
2013-09-16 09:31:00,460.82,461.6099,460.39,461.07,212774
2013-09-16 09:33:00,460.0799,460.88,458.97,459.2401,207880
2013-09-16 09:34:00,458.97,460.08,458.8,460.04,148121
...
2013-09-16 15:59:00,449.72,450.0774,449.59,449.95,146399
2013-09-16 16:00:00,450.12,450.12,449.65,449.65,444594
2013-09-17 09:30:00,448,448,447.5,447.96,173624
2013-09-17 09:32:00,450.6177,450.9,449.05,449.2701,268715
2013-09-17 09:33:00,451.39,451.96,450.58,450.7061,197019
...
...

对于 pandas,我如何前向填充系列以便每一分钟都在场?我应该看起来像这样:

2013-09-16 09:30:00,461.01,461.49,461,461,183507
2013-09-16 09:31:00,460.82,461.6099,460.39,461.07,212774
2013-09-16 09:32:00,460.82,461.6099,460.39,461.07,212774 <-- forward filled
2013-09-16 09:33:00,460.0799,460.88,458.97,459.2401,207880
2013-09-16 09:34:00,458.97,460.08,458.8,460.04,148121
...
2013-09-16 15:59:00,449.72,450.0774,449.59,449.95,146399
2013-09-16 16:00:00,450.12,450.12,449.65,449.65,444594
2013-09-17 09:30:00,448,448,447.5,447.96,173624
2013-09-17 09:31:00,448,448,447.5,447.96,173624 <-- forward filled
2013-09-17 09:32:00,450.6177,450.9,449.05,449.2701,268715
2013-09-17 09:33:00,451.39,451.96,450.58,450.7061,197019
...

它还需要考虑是否连续丢失了多分钟...

最佳答案

所以我将你的前 4 行复制到一个数据框中:

Out[49]:
0 1 2 3 4 5
0 2013-09-16 09:30:00 461.0100 461.4900 461.00 461.0000 183507
1 2013-09-16 09:31:00 460.8200 461.6099 460.39 461.0700 212774
2 2013-09-16 09:33:00 460.0799 460.8800 458.97 459.2401 207880
3 2013-09-16 09:34:00 458.9700 460.0800 458.80 460.0400 148121

然后

df1 = df.set_index(keys=[0]).resample('1min', fill_method='ffill')
df1

Out[52]:
1 2 3 4 5
0
2013-09-16 09:30:00 461.0100 461.4900 461.00 461.0000 183507
2013-09-16 09:31:00 460.8200 461.6099 460.39 461.0700 212774
2013-09-16 09:32:00 460.8200 461.6099 460.39 461.0700 212774
2013-09-16 09:33:00 460.0799 460.8800 458.97 459.2401 207880
2013-09-16 09:34:00 458.9700 460.0800 458.80 460.0400 148121

这还将处理多个缺失值并向前填充它们。

所以如果我有这样的数据

2013-09-17 09:30:00,448,448,447.5,447.96,173624
2013-09-17 09:33:00,451.39,451.96,450.58,450.7061,197019

然后做和之前一样的事情:

Out[55]:
1 2 3 4 5
0
2013-09-17 09:30:00 448.00 448.00 447.50 447.9600 173624
2013-09-17 09:31:00 448.00 448.00 447.50 447.9600 173624
2013-09-17 09:32:00 448.00 448.00 447.50 447.9600 173624
2013-09-17 09:33:00 451.39 451.96 450.58 450.7061 197019

这里的关键是你必须有一个日期时间索引,如果你想把它作为一个列,那么你可以设置 drop=Falseset_index .

关于python - 你如何清理和转发用 Pandas 填充多天的 1 分钟时间序列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19268003/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com