作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有三个包含缺失值的数据集,每个数据集由一个时间列和一个数据列组成。两行之间的最小时间差为 1 秒 (00:00:01):
Dataset 1: Dataset 2: Dataset 3:
00:00:00 81 00:00:00 70
00:00:01 81
00:00:02 81
00:00:03 81 00:00:03 99
00:00:04 81 00:00:04 100
00:00:05 80 00:00:05 80 00:00:05 101
00:00:06 80 00:00:06 100
00:00:07 92 00:00:07 88
00:00:08 83 00:00:08 80 00:00:08 88
00:00:09 84 00:00:09 83 00:00:09 87
00:00:10 86
00:00:11 89
00:00:12 90
00:00:13 92 00:00:13 92
00:00:14 94 00:00:14 94
00:00:15 94 00:00:15 96 00:00:15 93
00:00:16 96 00:00:16 97
00:00:17 98 00:00:17 100 00:00:17 99
00:00:18 100 00:00:18 99
00:00:19 101 00:00:19 101
00:00:20 103
为了可视化,上表显示了缺失值的空字段。真实数据是密集的,例如看起来像这样:
Dataset 1: Dataset 2: Dataset 3:
00:00:00 81 00:00:05 80 00:00:00 70
00:00:01 81 00:00:06 100 00:00:03 99
00:00:02 81 00:00:07 92 00:00:04 100
00:00:03 81 00:00:08 80 00:00:05 101
00:00:04 81 00:00:09 83 00:00:07 88
00:00:05 80 00:00:15 96 00:00:08 88
00:00:06 80 00:00:16 97 00:00:09 87
00:00:08 83 00:00:17 100 00:00:13 92
00:00:09 84 00:00:14 94
00:00:10 86 00:00:15 93
00:00:11 89 00:00:17 99
00:00:12 90 00:00:18 99
00:00:13 92 00:00:19 101
00:00:14 94
00:00:15 94
00:00:16 96
00:00:17 98
00:00:18 100
00:00:19 101
00:00:20 103
现在我想对齐数据,以便可以这样绘制:
这样:
我天真的做法是这样的:
n/a
作为值。是否有一些 Python 函数/库可以有效地执行这些步骤?或者有更好的方法吗?
问候,
最佳答案
您可以concat
所有 DataFrame 以及按 time
列索引:
dfs = [df1, df2, df3]
df = pd.concat([x.set_index('time')['val'] for x in dfs],
axis=1,
keys=['a','b','c'],
sort=True)
print (df)
a b c
00:00:00 81.0 NaN 70.0
00:00:01 81.0 NaN NaN
00:00:02 81.0 NaN NaN
00:00:03 81.0 NaN 99.0
00:00:04 81.0 NaN 100.0
00:00:05 80.0 80.0 101.0
00:00:06 80.0 100.0 NaN
00:00:07 NaN 92.0 88.0
00:00:08 83.0 80.0 88.0
00:00:09 84.0 83.0 87.0
00:00:10 86.0 NaN NaN
00:00:11 89.0 NaN NaN
00:00:12 90.0 NaN NaN
00:00:13 92.0 NaN 92.0
00:00:14 94.0 NaN 94.0
00:00:15 94.0 96.0 93.0
00:00:16 96.0 97.0 NaN
00:00:17 98.0 100.0 99.0
00:00:18 100.0 NaN 99.0
00:00:19 101.0 NaN 101.0
00:00:20 103.0 NaN NaN
如果每个 DataFrame 有时缺少,请添加 DataFrame.asfreq
,但是是必需的 DatetimeIndex
:
df.index = pd.to_datetime(df.index)
df = df.asfreq('S')
df.index = df.index.time
print (df)
a b c
00:00:00 81.0 NaN 70.0
00:00:01 81.0 NaN NaN
00:00:02 81.0 NaN NaN
00:00:03 81.0 NaN 99.0
00:00:04 81.0 NaN 100.0
00:00:05 80.0 80.0 101.0
00:00:06 80.0 100.0 NaN
00:00:07 NaN 92.0 88.0
00:00:08 83.0 80.0 88.0
00:00:09 84.0 83.0 87.0
00:00:10 86.0 NaN NaN
00:00:11 89.0 NaN NaN
00:00:12 90.0 NaN NaN
00:00:13 92.0 NaN 92.0
00:00:14 94.0 NaN 94.0
00:00:15 94.0 96.0 93.0
00:00:16 96.0 97.0 NaN
00:00:17 98.0 100.0 99.0
00:00:18 100.0 NaN 99.0
00:00:19 101.0 NaN 101.0
00:00:20 103.0 NaN NaN
最后用于绘图使用 DataFrame.plot
:
df.plot()
对于单独的每个图:
df.plot(subplots=True)
关于python - 将时间序列数据集与缺失值对齐以进行绘图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59442241/
我是一名优秀的程序员,十分优秀!