gpt4 book ai didi

python - Pandas 连续日期

转载 作者:行者123 更新时间:2023-12-01 09:22:48 28 4
gpt4 key购买 nike

我的数据如下所示:

ID  Date_off    Approved
1 3/7/2018 1
1 3/8/2018 1
2 3/29/2018 1
2 3/30/2018 1
3 3/30/2018 1
4 4/2/2018 1
5 4/2/2018 1
6 4/2/2018 1
7 4/2/2018 1
8 4/2/2018 1
4 4/3/2018 1
5 4/3/2018 1
6 4/3/2018 1
7 4/3/2018 1
8 4/3/2018 1
4 4/4/2018 1
9 4/4/2018 1
5 4/4/2018 1
10 4/4/2018 1
6 4/4/2018 1
8 4/4/2018 1
9 4/5/2018 1
11 4/5/2018 1
5 4/5/2018 1
10 4/5/2018 1
6 4/5/2018 1
12 4/5/2018 1
8 4/5/2018 1
13 4/10/2018 1
14 4/10/2018 1
15 4/10/2018 1
16 4/10/2018 1
17 4/11/2018 1
13 4/11/2018 1
15 4/11/2018 1
18 4/11/2018 1
16 4/11/2018 1
15 4/12/2018 1
16 4/12/2018 1
16 4/13/2018 1
19 4/16/2018 1
19 4/17/2018 1
20 4/17/2018 1
21 4/18/2018 1
19 4/18/2018 1
20 4/18/2018 1
21 4/19/2018 1
19 4/19/2018 1
20 4/19/2018 1
22 4/20/2018 1
21 4/20/2018 1
20 4/20/2018 1
22 4/23/2018 1
23 4/23/2018 1
22 4/24/2018 1
23 4/24/2018 1
22 4/25/2018 1
23 4/25/2018 1
22 4/26/2018 1
23 4/26/2018 1
14 4/26/2018 1
22 4/27/2018 1
10 4/27/2018 1
23 4/27/2018 1
5 4/27/2018 1
14 4/27/2018 1
8 4/30/2018 1
1 4/30/2018 1
10 4/30/2018 1
23 4/30/2018 1
11 4/30/2018 1
24 4/30/2018 1
25 4/30/2018 1
14 4/30/2018 1

这是某人度假的所有天数的列表。我需要按姓名分组并返回某人休假的连续天数以及第一天和最后一天。所以我正在寻找:

Name   Days  From         To
Bob 2 2017-04-06 2017-04-07
Jimy 2 2017-04-07 2017-04-08
Jimy 1 2017-04-10 2017-04-10
....

目前,斯科特的代码返回以下内容:

    Approved    Date_off    Date_off
sum amin amax
ID
1 3 3/7/2018 4/30/2018
2 2 3/29/2018 3/30/2018
3 1 3/30/2018 3/30/2018
4 1 4/2/2018 4/2/2018
4 2 4/3/2018 4/4/2018
5 1 4/2/2018 4/2/2018
5 1 4/3/2018 4/3/2018
5 3 4/4/2018 4/27/2018
6 1 4/2/2018 4/2/2018
6 1 4/3/2018 4/3/2018
6 2 4/4/2018 4/5/2018
7 2 4/2/2018 4/3/2018
8 1 4/2/2018 4/2/2018
8 1 4/3/2018 4/3/2018
8 3 4/4/2018 4/30/2018
9 2 4/4/2018 4/5/2018
10 4 4/4/2018 4/30/2018
11 2 4/5/2018 4/30/2018
12 1 4/5/2018 4/5/2018
13 2 4/10/2018 4/11/2018
14 1 4/10/2018 4/10/2018
14 3 4/26/2018 4/30/2018
15 1 4/10/2018 4/10/2018
15 2 4/11/2018 4/12/2018
16 1 4/10/2018 4/10/2018
16 1 4/11/2018 4/11/2018
16 2 4/12/2018 4/13/2018
17 1 4/11/2018 4/11/2018
18 1 4/11/2018 4/11/2018
19 1 4/16/2018 4/16/2018
19 1 4/17/2018 4/17/2018
19 2 4/18/2018 4/19/2018
20 1 4/17/2018 4/17/2018
20 1 4/18/2018 4/18/2018
20 2 4/19/2018 4/20/2018
21 1 4/18/2018 4/18/2018
21 2 4/19/2018 4/20/2018
22 1 4/20/2018 4/20/2018
22 1 4/23/2018 4/23/2018
22 1 4/24/2018 4/24/2018
22 1 4/25/2018 4/25/2018
22 2 4/26/2018 4/27/2018
23 1 4/23/2018 4/23/2018
23 1 4/24/2018 4/24/2018
23 1 4/25/2018 4/25/2018
23 3 4/26/2018 4/30/2018
24 1 4/30/2018 4/30/2018
25 1 4/30/2018 4/30/2018

我只将最后一行更改为:

df.groupby(['ID','cons']).agg({'Approved':sum, 'Date_off':[np.min, np.max]}).reset_index(level=1,drop=True)

如果日期列上没有最小/最大,它会返回相同的结果。也许工作日与周末会引起问题?在减去相邻天数时,应将星期一和星期五视为相邻天。

最佳答案

已更新

df['daysdiff'] = df.sort_values('Date').groupby(['Name'])['Date'].diff()

# I modified this line
df['cons'] = (df.sort_values('Date_off').groupby('ID')['daysdiff']
.apply(lambda x: (x.bfill().dt.days != 1).cumsum()))

print(df.groupby(['ID','cons'])['Approved','Date_off']
.agg({'Approved':'sum','Date_off':['min','max']})
.reset_index(level=1, drop=True))

输出:

   Approved   Date_off           
sum min max
ID
1 2 2018-03-07 2018-03-08
1 1 2018-04-30 2018-04-30
2 2 2018-03-29 2018-03-30
3 1 2018-03-30 2018-03-30
4 3 2018-04-02 2018-04-04
5 4 2018-04-02 2018-04-05
5 1 2018-04-27 2018-04-27
6 4 2018-04-02 2018-04-05
7 2 2018-04-02 2018-04-03
8 4 2018-04-02 2018-04-05
8 1 2018-04-30 2018-04-30
9 2 2018-04-04 2018-04-05
10 2 2018-04-04 2018-04-05
10 1 2018-04-27 2018-04-27
10 1 2018-04-30 2018-04-30
11 1 2018-04-05 2018-04-05
11 1 2018-04-30 2018-04-30
12 1 2018-04-05 2018-04-05
13 2 2018-04-10 2018-04-11
14 1 2018-04-10 2018-04-10
14 2 2018-04-26 2018-04-27
14 1 2018-04-30 2018-04-30
15 3 2018-04-10 2018-04-12
16 4 2018-04-10 2018-04-13
17 1 2018-04-11 2018-04-11
18 1 2018-04-11 2018-04-11
19 4 2018-04-16 2018-04-19
20 4 2018-04-17 2018-04-20
21 3 2018-04-18 2018-04-20
22 1 2018-04-20 2018-04-20
22 5 2018-04-23 2018-04-27
23 5 2018-04-23 2018-04-27
23 1 2018-04-30 2018-04-30
24 1 2018-04-30 2018-04-30
25 1 2018-04-30 2018-04-30

关于python - Pandas 连续日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50687296/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com