gpt4 book ai didi

Python pandas,如何截断DatetimeIndex并仅在特定间隔内填充缺失数据

转载 作者:行者123 更新时间:2023-12-01 05:49:59 25 4
gpt4 key购买 nike

 2012-10-08 07:12:22            0.0    0          0  2315.6    0     0.0    0
2012-10-08 09:14:00 2306.4 20 326586240 2306.4 472 2306.8 4
2012-10-08 09:15:00 2306.8 34 249805440 2306.8 361 2308.0 26
2012-10-08 09:15:01 2308.0 1 53309040 2307.4 77 2308.6 9
2012-10-08 09:15:01.500000 2308.2 1 124630140 2307.0 180 2308.4 1
2012-10-08 09:15:02 2307.0 5 85846260 2308.2 124 2308.0 9
2012-10-08 09:15:02.500000 2307.0 3 128073540 2307.0 185 2307.6 11
......
2012-10-09 07:19:30 0.0 0 0 2276.6 0 0.0 0
2012-10-09 09:14:00 2283.2 80 98634240 2283.2 144 2283.4 1
2012-10-09 09:15:00 2285.2 18 126814260 2285.2 185 2285.6 3
2012-10-09 09:15:01 2285.8 6 98719560 2286.8 144 2287.0 25
2012-10-09 09:15:01.500000 2287.0 36 144759420 2288.8 211 2289.0 4
2012-10-09 09:15:02 2287.4 6 109829280 2287.4 160 2288.6 5
......

我有一个 DataFrame,其中包含如上所述的几天的交易所交易数据。我想要的数据来自 9:00:00AM - 11:30:00AM13:00:00 - 15:15:00,所以我会喜欢做两件事,

  1. 对于 DataFrame 中的每个日期,截断为仅包含9:00:00AM - 11:30:00AM13:00:00 - 15:15:00
  2. 范围
  3. 使用1.中的范围,以500毫秒的频率填充缺失数据

pandas 截断函数只允许我根据日期截断,但我想在这里根据 datetime.time 截断。还有如何仅在我感兴趣的时间间隔内填充缺失的数据。

非常感谢。

最佳答案

  1. for each date in the DataFrame truncate to only have data in the range of 9:00:00AM - 11:30:00AM and 13:00:00 - 15:15:00

使用索引slicing为此,例如:

df = df[start_timestamp:end_timestamp]
  1. with the range in 1., fill missing data with a frequency of 500 milliseconds

在 500 毫秒时生成一个带有索引的新数据帧。 Merge该数据框与使用外连接的原始数据框。这将为您提供一个包含定期行的数据框。缺失观测值的行将包含 NaN 值。然后用 fillna 填充缺失的 NaN 值.

示例:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: data = pd.DataFrame({"value": np.arange(5)}, index=pd.date_range("2013/02/03", periods=5, freq="3Min"))

In [4]: data
Out[4]:
value
2013-02-03 00:00:00 0
2013-02-03 00:03:00 1
2013-02-03 00:06:00 2
2013-02-03 00:09:00 3
2013-02-03 00:12:00 4

In [5]: filler = pd.DataFrame({"value": [100] * 15}, index=pd.date_range("2013/02/03", periods=15, freq="1Min"))

In [6]: filler
Out[6]:
value
2013-02-03 00:00:00 100
2013-02-03 00:01:00 100
2013-02-03 00:02:00 100
2013-02-03 00:03:00 100
2013-02-03 00:04:00 100
2013-02-03 00:05:00 100
2013-02-03 00:06:00 100
2013-02-03 00:07:00 100
2013-02-03 00:08:00 100
2013-02-03 00:09:00 100
2013-02-03 00:10:00 100
2013-02-03 00:11:00 100
2013-02-03 00:12:00 100
2013-02-03 00:13:00 100
2013-02-03 00:14:00 100

In [7]: merged = filler.merge(data, how='left', left_index=True, right_index=True)

In [8]: merged["value"] = np.where(np.isfinite(merged.value_y), merged.value_y, merged.value_x)

In [9]: merged
Out[9]:
value_x value_y value
2013-02-03 00:00:00 100 0 0
2013-02-03 00:01:00 100 NaN 100
2013-02-03 00:02:00 100 NaN 100
2013-02-03 00:03:00 100 1 1
2013-02-03 00:04:00 100 NaN 100
2013-02-03 00:05:00 100 NaN 100
2013-02-03 00:06:00 100 2 2
2013-02-03 00:07:00 100 NaN 100
2013-02-03 00:08:00 100 NaN 100
2013-02-03 00:09:00 100 3 3
2013-02-03 00:10:00 100 NaN 100
2013-02-03 00:11:00 100 NaN 100
2013-02-03 00:12:00 100 4 4
2013-02-03 00:13:00 100 NaN 100
2013-02-03 00:14:00 100 NaN 100

In [10]: merged['2013-02-03 00:01:00':'2013-02-03 00:10:00']
Out[10]:
value_x value_y value
2013-02-03 00:01:00 100 NaN 100
2013-02-03 00:02:00 100 NaN 100
2013-02-03 00:03:00 100 1 1
2013-02-03 00:04:00 100 NaN 100
2013-02-03 00:05:00 100 NaN 100
2013-02-03 00:06:00 100 2 2
2013-02-03 00:07:00 100 NaN 100
2013-02-03 00:08:00 100 NaN 100
2013-02-03 00:09:00 100 3 3
2013-02-03 00:10:00 100 NaN 100

关于Python pandas,如何截断DatetimeIndex并仅在特定间隔内填充缺失数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14671345/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com