gpt4 book ai didi

python - 比较不同分辨率的 Pandas 时间戳

转载 作者:太空宇宙 更新时间:2023-11-04 04:30:02 26 4
gpt4 key购买 nike

我有两个时间序列数据帧(约 45k 行对 5 行)。一个的时间戳低至毫秒,另一个的时间戳低至秒。我想在较大的数据框中创建一个新列,以便:a)一个值被附加到较大数据框中的行,其时间戳最接近(以秒为单位)较小数据框中的时间戳b) 任何其他时间戳的 NaN。

larger df = 
timestamp price
0 2018-04-24 06:01:02.600 1
1 2018-04-24 06:01:02.600 1
2 2018-04-24 06:01:02.600 2
3 2018-04-24 06:01:02.600 4
4 2018-04-24 06:01:02.775 2
5 2018-04-24 06:01:02.825 3
6 2018-04-24 06:01:03.050 5
7 2018-04-24 06:01:03.125 6
8 2018-04-24 06:01:03.275 7
9 2018-04-24 06:01:03.300 4
10 2018-04-24 06:01:03.300 3
11 2018-04-24 06:01:03.950 5
12 2018-04-24 06:01:04.050 5


smaller df =
timestamp price
0 24/04/2018 06:01:02 2
1 24/04/2018 12:33:37 4
2 24/04/2018 14:29:34 5
3 24/04/2018 15:02:50 6
4 24/04/2018 15:20:04 7

desired df =

timestamp price newCol
0 2018-04-24 06:01:02.600 1 aValue
1 2018-04-24 06:01:02.600 1 NaN
2 2018-04-24 06:01:02.600 2 NaN
3 2018-04-24 06:01:02.600 4 NaN
4 2018-04-24 06:01:02.775 2 NaN
5 2018-04-24 06:01:02.825 3 NaN
6 2018-04-24 06:01:03.050 5 NaN
7 2018-04-24 06:01:03.125 6 NaN
8 2018-04-24 06:01:03.275 7 NaN
9 2018-04-24 06:01:03.300 4 NaN
10 2018-04-24 06:01:03.300 3 NaN
11 2018-04-24 06:01:03.950 5 NaN
12 2018-04-24 06:01:04.050 5 NaN

非常感谢您的帮助。我对一般的编程还太陌生,无法轻松解决这个问题。

非常感谢

最佳答案

重建索引

为了只使用一次值,我必须从较小的数据帧跟踪时间戳。因此,当我使用 'nearest' reindex 时,我会包含这些值。然后我在掩码中使用 duplicated

df_small_new = df_small.set_index('timestamp', drop=False)
df_small_new = df_small_new.reindex(df_large.timestamp, method='nearest')


df_large.assign(
newcol=df_small_new.price.mask(df_small_new.timestamp.duplicated()).values)

timestamp price newcol
0 2018-04-24 06:01:02.600 1 2.0
1 2018-04-24 06:01:02.600 1 NaN
2 2018-04-24 06:01:02.600 2 NaN
3 2018-04-24 06:01:02.600 4 NaN
4 2018-04-24 06:01:02.775 2 NaN
5 2018-04-24 06:01:02.825 3 NaN
6 2018-04-24 06:01:03.050 5 NaN
7 2018-04-24 06:01:03.125 6 NaN
8 2018-04-24 06:01:03.275 7 NaN
9 2018-04-24 06:01:03.300 4 NaN
10 2018-04-24 06:01:03.300 3 NaN
11 2018-04-24 06:01:03.950 5 NaN
12 2018-04-24 06:01:04.050 5 NaN

pandas.merge_asof

  • 重命名小数据框中的 'price'
  • 确保将方向设置为'nearest'
  • 这几乎回答了问题

pd.merge_asof(
df_large,
df_small.rename(columns={'price': 'newcol'}),
on='timestamp', direction='nearest'
)

timestamp price newcol
0 2018-04-24 06:01:02.600 1 2
1 2018-04-24 06:01:02.600 1 2
2 2018-04-24 06:01:02.600 2 2
3 2018-04-24 06:01:02.600 4 2
4 2018-04-24 06:01:02.775 2 2
5 2018-04-24 06:01:02.825 3 2
6 2018-04-24 06:01:03.050 5 2
7 2018-04-24 06:01:03.125 6 2
8 2018-04-24 06:01:03.275 7 2
9 2018-04-24 06:01:03.300 4 2
10 2018-04-24 06:01:03.300 3 2
11 2018-04-24 06:01:03.950 5 2
12 2018-04-24 06:01:04.050 5 2

关于python - 比较不同分辨率的 Pandas 时间戳,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52836482/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com