gpt4 book ai didi

python - Pandas 数据框获取掩码列的零(0)之间的所有行,并获取每组的第一行和最后一行

转载 作者:行者123 更新时间:2023-12-04 10:19:13 25 4
gpt4 key购买 nike

我有一个这样的数据框。

   store daiban  signal  ...          start_time            end_time mask
0 0901 0001 0 ... 2020-03-31 00:00:00 2020-03-31 00:35:00 0
1 0901 0001 1 ... 2020-03-31 00:35:00 2020-03-31 00:36:40 1
2 0901 0001 2 ... 2020-03-31 00:36:40 2020-03-31 00:38:44 1
3 0901 0001 0 ... 2020-03-31 00:38:44 2020-03-31 01:10:40 0
4 0901 0001 1 ... 2020-03-31 01:10:40 2020-03-31 01:12:24 1
5 0901 0001 2 ... 2020-03-31 01:12:24 2020-03-31 01:13:40 1
6 0901 0001 1 ... 2020-03-31 01:13:40 2020-03-31 01:15:04 1
7 0901 0001 2 ... 2020-03-31 01:15:04 2020-03-31 01:17:00 1
8 0901 0001 0 ... 2020-03-31 01:17:00 2020-03-31 02:33:04 0
9 0901 0001 1 ... 2020-03-31 02:33:04 2020-03-31 02:34:52 1
10 0901 0001 2 ... 2020-03-31 02:34:52 2020-03-31 02:37:28 1

我想获取掩码列的零(0)之间的所有行,并获取每组的第一行的 start_time 和最后一行的 end_time

例如

1) 第一组将是索引 1 到 2。
1   0901   0001       1  ... 2020-03-31 00:35:00 2020-03-31 00:36:40    1
2 0901 0001 2 ... 2020-03-31 00:36:40 2020-03-31 00:38:44 1

2) 获取每组第一行的 start_time和最后一行的 end_time
0   0901   0001     2020-03-31 00:35:00  2020-03-31 00:38:44    

预期产出
   store daiban        start_time            end_time 
0 0901 0001 2020-03-31 00:35:00 2020-03-31 00:38:44
1 0901 0001 2020-03-31 01:10:40 2020-03-31 01:17:00
2 0901 0001 2020-03-31 02:33:04 2020-03-31 02:37:28

用于重现示例的数据框
from pandas import Timestamp
df = pd.DataFrame.from_dict({'store': {0: '0901',
1: '0901',
2: '0901',
3: '0901',
4: '0901',
5: '0901',
6: '0901',
7: '0901',
8: '0901',
9: '0901',
10: '0901'},
'daiban': {0: '0001',
1: '0001',
2: '0001',
3: '0001',
4: '0001',
5: '0001',
6: '0001',
7: '0001',
8: '0001',
9: '0001',
10: '0001'},
'signal': {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2, 6: 1, 7: 2, 8: 0, 9: 1, 10: 2},
'cum_sum': {0: 525,
1: 25,
2: 31,
3: 479,
4: 26,
5: 19,
6: 21,
7: 29,
8: 1141,
9: 27,
10: 39},
'seconds': {0: 2100,
1: 100,
2: 124,
3: 1916,
4: 104,
5: 76,
6: 84,
7: 116,
8: 4564,
9: 108,
10: 156},
'start_time': {0: Timestamp('2020-03-31 00:00:00'),
1: Timestamp('2020-03-31 00:35:00'),
2: Timestamp('2020-03-31 00:36:40'),
3: Timestamp('2020-03-31 00:38:44'),
4: Timestamp('2020-03-31 01:10:40'),
5: Timestamp('2020-03-31 01:12:24'),
6: Timestamp('2020-03-31 01:13:40'),
7: Timestamp('2020-03-31 01:15:04'),
8: Timestamp('2020-03-31 01:17:00'),
9: Timestamp('2020-03-31 02:33:04'),
10: Timestamp('2020-03-31 02:34:52')},
'end_time': {0: Timestamp('2020-03-31 00:35:00'),
1: Timestamp('2020-03-31 00:36:40'),
2: Timestamp('2020-03-31 00:38:44'),
3: Timestamp('2020-03-31 01:10:40'),
4: Timestamp('2020-03-31 01:12:24'),
5: Timestamp('2020-03-31 01:13:40'),
6: Timestamp('2020-03-31 01:15:04'),
7: Timestamp('2020-03-31 01:17:00'),
8: Timestamp('2020-03-31 02:33:04'),
9: Timestamp('2020-03-31 02:34:52'),
10: Timestamp('2020-03-31 02:37:28')},
'mask': {0: 0, 1: 1, 2: 1, 3: 0, 4: 1, 5: 1, 6: 1, 7: 1, 8: 0, 9: 1, 10: 1}})

最佳答案

我们使用的 IIUC cumsumfilter创建他的数据帧然后使用 agg

df=df.loc[df['mask'].ne(0)].groupby([df['mask'].eq(0).cumsum(),df.store,df.daiban]).\
agg({'start_time':'first','end_time':'last'}).reset_index(level=[1,2])
mask store daiban start_time end_time
0 1 0901 0001 2020-03-31 00:35:00 2020-03-31 00:38:44
1 2 0901 0001 2020-03-31 01:10:40 2020-03-31 01:17:00
2 3 0901 0001 2020-03-31 02:33:04 2020-03-31 02:37:28

关于python - Pandas 数据框获取掩码列的零(0)之间的所有行,并获取每组的第一行和最后一行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60942888/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com