gpt4 book ai didi

python - 将时间序列数据集与缺失值对齐以进行绘图

转载 作者:行者123 更新时间:2023-12-02 02:43:58 27 4
gpt4 key购买 nike

我有三个包含缺失值的数据集,每个数据集由一个时间列和一个数据列组成。两行之间的最小时间差为 1 秒 (00:00:01):

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00 81 00:00:00 70
00:00:01 81
00:00:02 81
00:00:03 81 00:00:03 99
00:00:04 81 00:00:04 100
00:00:05 80 00:00:05 80 00:00:05 101
00:00:06 80 00:00:06 100
00:00:07 92 00:00:07 88
00:00:08 83 00:00:08 80 00:00:08 88
00:00:09 84 00:00:09 83 00:00:09 87
00:00:10 86
00:00:11 89
00:00:12 90
00:00:13 92 00:00:13 92
00:00:14 94 00:00:14 94
00:00:15 94 00:00:15 96 00:00:15 93
00:00:16 96 00:00:16 97
00:00:17 98 00:00:17 100 00:00:17 99
00:00:18 100 00:00:18 99
00:00:19 101 00:00:19 101
00:00:20 103

为了可视化,上表显示了缺失值的空字段。真实数据是密集的,例如看起来像这样:

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00 81 00:00:05 80 00:00:00 70
00:00:01 81 00:00:06 100 00:00:03 99
00:00:02 81 00:00:07 92 00:00:04 100
00:00:03 81 00:00:08 80 00:00:05 101
00:00:04 81 00:00:09 83 00:00:07 88
00:00:05 80 00:00:15 96 00:00:08 88
00:00:06 80 00:00:16 97 00:00:09 87
00:00:08 83 00:00:17 100 00:00:13 92
00:00:09 84 00:00:14 94
00:00:10 86 00:00:15 93
00:00:11 89 00:00:17 99
00:00:12 90 00:00:18 99
00:00:13 92 00:00:19 101
00:00:14 94
00:00:15 94
00:00:16 96
00:00:17 98
00:00:18 100
00:00:19 101
00:00:20 103

现在我想对齐数据,以便可以这样绘制:

Combined

这样:

Split

我天真的做法是这样的:

  1. 查找每个数据集中的最小/最大时间。
  2. 创建一个表格,其中每个时间一行,三列,每列都有 n/a 作为值。
  3. 循环遍历每个数据集并将值分配给表。

是否有一些 Python 函数/库可以有效地执行这些步骤?或者有更好的方法吗?

问候,

最佳答案

您可以concat所有 DataFrame 以及按 time 列索引:

dfs = [df1, df2, df3]
df = pd.concat([x.set_index('time')['val'] for x in dfs],
axis=1,
keys=['a','b','c'],
sort=True)
print (df)
a b c
00:00:00 81.0 NaN 70.0
00:00:01 81.0 NaN NaN
00:00:02 81.0 NaN NaN
00:00:03 81.0 NaN 99.0
00:00:04 81.0 NaN 100.0
00:00:05 80.0 80.0 101.0
00:00:06 80.0 100.0 NaN
00:00:07 NaN 92.0 88.0
00:00:08 83.0 80.0 88.0
00:00:09 84.0 83.0 87.0
00:00:10 86.0 NaN NaN
00:00:11 89.0 NaN NaN
00:00:12 90.0 NaN NaN
00:00:13 92.0 NaN 92.0
00:00:14 94.0 NaN 94.0
00:00:15 94.0 96.0 93.0
00:00:16 96.0 97.0 NaN
00:00:17 98.0 100.0 99.0
00:00:18 100.0 NaN 99.0
00:00:19 101.0 NaN 101.0
00:00:20 103.0 NaN NaN

如果每个 DataFrame 有时缺少,请添加 DataFrame.asfreq ,但是是必需的 DatetimeIndex:

df.index = pd.to_datetime(df.index)
df = df.asfreq('S')
df.index = df.index.time
print (df)
a b c
00:00:00 81.0 NaN 70.0
00:00:01 81.0 NaN NaN
00:00:02 81.0 NaN NaN
00:00:03 81.0 NaN 99.0
00:00:04 81.0 NaN 100.0
00:00:05 80.0 80.0 101.0
00:00:06 80.0 100.0 NaN
00:00:07 NaN 92.0 88.0
00:00:08 83.0 80.0 88.0
00:00:09 84.0 83.0 87.0
00:00:10 86.0 NaN NaN
00:00:11 89.0 NaN NaN
00:00:12 90.0 NaN NaN
00:00:13 92.0 NaN 92.0
00:00:14 94.0 NaN 94.0
00:00:15 94.0 96.0 93.0
00:00:16 96.0 97.0 NaN
00:00:17 98.0 100.0 99.0
00:00:18 100.0 NaN 99.0
00:00:19 101.0 NaN 101.0
00:00:20 103.0 NaN NaN

最后用于绘图使用 DataFrame.plot :

df.plot()

对于单独的每个图:

df.plot(subplots=True)

关于python - 将时间序列数据集与缺失值对齐以进行绘图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59442241/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com