gpt4 book ai didi

python - 从具有不同长度值的字典生成多索引数据框

转载 作者:太空宇宙 更新时间:2023-11-03 17:25:38 25 4
gpt4 key购买 nike

我有以下字典:

dic = {'T1':["2013-11-12 17:35:00", "2013-11-12 17:36:00", "2013-11-12 17:37:00", "2013-11-12 17:38:00", 
"2013-11-12 17:40:00", "2013-11-12 17:41:00", "2013-11-12 17:42:00"], 'T2':["2013-11-12 12:15:00", "2013-11-12 12:16:00", "2013-11-13 16:32:00", "2013-11-13 16:33:00",
"2013-11-13 16:34:00"]}

我想从中生成以下multiIndexed数据帧:

                      T1                                            T2
Start Stop Start Stop
2013-11-12 17:35:00 2013-11-12 17:38:00 2013-11-12 12:15:00 2013-11-12 12:16:00
2013-11-12 17:40:00 2013-11-12 17:42:00 2013-11-13 16:32:00 2013-11-13 16:34:00

数据帧描述的是传感器 T1 或 T2 的某些事件开始和结束的时间。如果两次事件之间的时间差小于 1 分钟,我认为同一事件仍在继续,而当此差异大于 1 分钟时,则表明新事件开始。

感谢您的帮助:)

最佳答案

您可以计算连续时间戳之间的差异,并形成一个掩码,当差异不是 1 分钟时该掩码为 True:

df['mask'] = (df[key].diff() / np.timedelta64(1, 'm')) != 1

然后采用掩码的累加来确定哪些行属于哪个组:

df['group'] = df['mask'].cumsum()

产生类似:

                   T2   mask  group
0 2013-11-12 12:15:00 True 1
1 2013-11-12 12:16:00 False 1
2 2013-11-13 16:32:00 True 2
3 2013-11-13 16:33:00 False 2
4 2013-11-13 16:34:00 False 2

T1 mask group
0 2013-11-12 17:38:00 True 1
1 2013-11-12 17:40:00 True 2
2 2013-11-12 17:42:00 True 3

现在按 group 列进行分组,并为每个组查找第一个和最后一个时间戳:

result[key] = df.groupby(['group'])[key].agg(['first', 'last'])
<小时/>
import numpy as np
import pandas as pd
pd.options.display.width = 1000
dic = {'T1':["2013-11-12 17:35:00", "2013-11-12 17:36:00", "2013-11-12 17:37:00",
"2013-11-12 17:38:00", "2013-11-12 17:40:00", "2013-11-12 17:41:00",
"2013-11-12 17:42:00"],
'T2':["2013-11-12 12:15:00", "2013-11-12 12:16:00", "2013-11-13 16:32:00",
"2013-11-13 16:33:00", "2013-11-13 16:34:00"]}

result = dict()
for key, val in dic.items():
df = pd.DataFrame({key: pd.to_datetime(val)})
df['mask'] = (df[key].diff() / np.timedelta64(1, 'm')) != 1
df['group'] = df['mask'].cumsum()
result[key] = df.groupby(['group'])[key].agg(['first', 'last'])
result[key] = result[key].rename(columns={'first':'Start', 'last':'Stop'})
result = pd.concat(result, axis=1)
print(result)

产量

                       T1                                      T2                    
Start Stop Start Stop
group
1 2013-11-12 17:35:00 2013-11-12 17:38:00 2013-11-12 12:15:00 2013-11-12 12:16:00
2 2013-11-12 17:40:00 2013-11-12 17:42:00 2013-11-13 16:32:00 2013-11-13 16:34:00

关于python - 从具有不同长度值的字典生成多索引数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32610129/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com