gpt4 book ai didi

python - 从不规则时间序列索引重新采样的 pandas DataFrame

转载 作者:太空狗 更新时间:2023-10-30 02:53:31 25 4
gpt4 key购买 nike

我想每五秒对一个 DataFrame 重新采样一次,其中原始数据的时间戳是不规则的。抱歉,如果这看起来像一个重复的问题,但我对与数据时间戳对齐的插值有疑问,这就是为什么我将我的 DataFrame 包含在这个问题中。 The graph in this answer 显示了我想要的结果,但我不能使用那里建议的 traces 包。我使用 pandas 0.19.0

考虑以下飞机的爬升路径 (as dict on pastebin):

    Altitude        Time
1 0.00 0.00000
2 1000.00 16.45350
3 2000.00 33.19584
4 3000.00 50.25330
5 4000.00 67.64580
6 5000.00 85.38720
7 6000.00 103.56720
8 7000.00 122.29260
9 8000.00 141.61440
10 9000.00 161.59140
11 9999.67 182.27940
12 10000.30 182.33940
13 10000.30 199.76880
14 10000.30 199.82880
15 11000.00 221.67660
16 12000.00 244.36260
17 13000.00 267.93900
18 14000.00 292.46940
19 15000.00 318.01080
20 16000.00 344.36820
21 17000.00 371.32200
22 18000.00 398.91420
23 19000.00 427.19100
24 20000.00 456.24900
25 21000.00 486.38940
26 22000.00 517.91640
27 23000.00 550.96140
28 24000.00 585.65460
29 25000.00 622.12800
30 26000.00 660.35400
31 27000.00 700.37400
32 28000.00 742.39200
33 29000.00 786.57600
34 30000.00 833.13000
35 31000.00 882.09000
36 32000.00 933.46200
37 33000.00 987.40800
38 34000.00 1044.06000
39 35000.00 1103.85000
40 36000.00 1167.52200
41 36088.90 1173.39000
42 36089.60 1173.45000
43 36671.70 1216.60200
44 36672.40 1216.66200
45 38000.00 1295.80200
46 39000.00 1368.45000
47 40000.00 1458.00000
48 41000.00 1574.08200
49 42000.00 1730.97000
50 42231.00 1775.19600

尝试过的解决方案

首先, 我曾尝试在保持原始索引不变的情况下进行重采样,如 this question 所示,因此我可以进行线性插值,但我发现没有产生正确结果的插值方法(注意原始仅在 16.45s 匹配的时间列):

df = df.set_index(pd.to_datetime(df['Time'], unit='s'), drop=False)
resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate().iloc[:6]

Time Altitude
1970-01-01 00:00:00.000000 0.000000 0.0
1970-01-01 00:00:05.000000 4.113375 250.0
1970-01-01 00:00:10.000000 8.226750 500.0
1970-01-01 00:00:15.000000 12.340125 750.0
1970-01-01 00:00:16.453500 16.453500 1000.0
1970-01-01 00:00:20.000000 20.639085 1250.0

其次,我尝试在不保留原始索引的情况下进行重采样,首先下降到 1s,然后上升到 5s,如 this answer 所示,但插值值没有在数据末尾对齐,高度值也没有(1000 英尺应该在 15 到 20 秒之间)。仅仅重采样到 1s 已经产生了错误的结果。

df.resample('1s').interpolate(method='linear').resample('5s').asfreq()

Time Altitude
1970-01-01 00:00:00 0.0 0.000000
1970-01-01 00:00:05 5.0 137.174211
1970-01-01 00:00:10 10.0 274.348422
1970-01-01 00:00:15 15.0 411.522634
1970-01-01 00:00:20 20.0 548.696845
1970-01-01 00:00:25 25.0 685.871056
1970-01-01 00:00:30 30.0 823.045267
1970-01-01 00:00:35 35.0 960.219479
1970-01-01 00:00:40 40.0 1097.393690
1970-01-01 00:00:45 45.0 1234.567901
1970-01-01 00:00:50 50.0 1371.742112
1970-01-01 00:00:55 55.0 1508.916324
1970-01-01 00:01:00 60.0 1646.090535
1970-01-01 00:01:05 65.0 1783.264746
1970-01-01 00:01:10 70.0 1920.438957
1970-01-01 00:01:15 75.0 2057.613169
1970-01-01 00:01:20 80.0 2194.787380
1970-01-01 00:01:25 85.0 2331.961591
1970-01-01 00:01:30 90.0 2469.135802
1970-01-01 00:01:35 95.0 2606.310014
1970-01-01 00:01:40 100.0 2743.484225
1970-01-01 00:01:45 105.0 2880.658436
1970-01-01 00:01:50 110.0 3017.832647
1970-01-01 00:01:55 115.0 3155.006859
1970-01-01 00:02:00 120.0 3292.181070
1970-01-01 00:02:05 125.0 3429.355281
1970-01-01 00:02:10 130.0 3566.529492
1970-01-01 00:02:15 135.0 3703.703704
1970-01-01 00:02:20 140.0 3840.877915
1970-01-01 00:02:25 145.0 3978.052126
... ... ...
1970-01-01 00:27:10 1458.0 40000.000000
1970-01-01 00:27:15 1458.0 40000.000000
1970-01-01 00:27:20 1458.0 40000.000000
1970-01-01 00:27:25 1458.0 40000.000000
1970-01-01 00:27:30 1458.0 40000.000000
1970-01-01 00:27:35 1458.0 40000.000000
1970-01-01 00:27:40 1458.0 40000.000000
1970-01-01 00:27:45 1458.0 40000.000000
1970-01-01 00:27:50 1458.0 40000.000000
1970-01-01 00:27:55 1458.0 40000.000000
1970-01-01 00:28:00 1458.0 40000.000000
1970-01-01 00:28:05 1458.0 40000.000000
1970-01-01 00:28:10 1458.0 40000.000000
1970-01-01 00:28:15 1458.0 40000.000000
1970-01-01 00:28:20 1458.0 40000.000000
1970-01-01 00:28:25 1458.0 40000.000000
1970-01-01 00:28:30 1458.0 40000.000000
1970-01-01 00:28:35 1458.0 40000.000000
1970-01-01 00:28:40 1458.0 40000.000000
1970-01-01 00:28:45 1458.0 40000.000000
1970-01-01 00:28:50 1458.0 40000.000000
1970-01-01 00:28:55 1458.0 40000.000000
1970-01-01 00:29:00 1458.0 40000.000000
1970-01-01 00:29:05 1458.0 40000.000000
1970-01-01 00:29:10 1458.0 40000.000000
1970-01-01 00:29:15 1458.0 40000.000000
1970-01-01 00:29:20 1458.0 40000.000000
1970-01-01 00:29:25 1458.0 40000.000000
1970-01-01 00:29:30 1458.0 40000.000000
1970-01-01 00:29:35 1458.0 40000.000000

问题

如何在执行正确插值的同时将原始数据重新采样为 5 秒?我只是使用了错误的插值方法吗?

最佳答案

在@Martin Schmelzer 的帮助下(谢谢!)我发现问题中第一个建议的方法是有效的,当应用 time 作为 pandas 的 method 参数时' 插值方法:

resample_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='5s')
dummy_frame = pd.DataFrame(np.NaN, index=resample_index, columns=df.columns)
df.combine_first(dummy_frame).interpolate('time').iloc[:6]

Altitude Time
1970-01-01 00:00:00.000000 0.000000 0.0000
1970-01-01 00:00:05.000000 303.886711 5.0000
1970-01-01 00:00:10.000000 607.773422 10.0000
1970-01-01 00:00:15.000000 911.660133 15.0000
1970-01-01 00:00:16.453500 1000.000000 16.4535
1970-01-01 00:00:20.000000 1211.828215 20.0000

然后我可以将其重新采样为 5 秒或其他时间,结果是准确的。

df.combine_first(dummy_frame).interpolate('time').resample('5s').asfreq().head()
Altitude Time
1970-01-01 00:00:00 0.000000 0.0
1970-01-01 00:00:05 303.886711 5.0
1970-01-01 00:00:10 607.773422 10.0
1970-01-01 00:00:15 911.660133 15.0
1970-01-01 00:00:20 1211.828215 20.0

所以最后证明我只是使用了错误的插值方法。

关于python - 从不规则时间序列索引重新采样的 pandas DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49191998/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com