gpt4 book ai didi

python - 那一年有人在 Pandas 呆了几个月?

转载 作者:行者123 更新时间:2023-12-04 12:09:28 25 4
gpt4 key购买 nike

我们考虑这个代表 worker 契约(Contract)的数据框。我想列出一个 worker 在某一年工作了多少个月。

df = pd.DataFrame{'id': {0: 19019,
1: 17160, 2: 21593, 3: 3146, 4: 21593, 5: 3146, 6: 22737, 7: 25311, 8: 25740,
9: 3289, 10: 26312, 11: 28028, 12: 17017, 13: 27742, 14: 26884,
15: 31174, 16: 31889, 17: 33319, 18: 35178, 19: 35464},
'start_date': {0: Timestamp('2016-06-01 00:00:00'),
1: Timestamp('2016-09-01 00:00:00'), 2: Timestamp('2016-11-01 00:00:00'),
3: Timestamp('2017-01-01 00:00:00'), 4: Timestamp('2017-03-01 00:00:00'),
5: Timestamp('2017-08-01 00:00:00'), 6: Timestamp('2018-09-01 00:00:00'),
7: Timestamp('2018-09-01 00:00:00'),8: Timestamp('2018-10-01 00:00:00'),
9: Timestamp('1999-11-01 00:00:00'),10: Timestamp('2018-10-01 00:00:00'),
11: Timestamp('2019-01-01 00:00:00'),12: Timestamp('2009-11-01 00:00:00'),
13: Timestamp('2019-09-01 00:00:00'),14: Timestamp('2020-03-01 00:00:00'),
15: Timestamp('2020-03-01 00:00:00'),16: Timestamp('2020-04-14 00:00:00'),
17: Timestamp('2020-10-01 00:00:00'),18: Timestamp('2021-03-01 00:00:00'),
19: Timestamp('2021-03-08 00:00:00')},
'end_date': {0: Timestamp('2017-01-31 00:00:00'),
1: Timestamp('2018-07-31 00:00:00'),2: Timestamp('2017-02-28 00:00:00'),
3: Timestamp('2017-07-31 00:00:00'),4: Timestamp('2017-12-31 00:00:00'),
5: Timestamp('2017-12-31 00:00:00'),6: Timestamp('2021-12-31 00:00:00'),
7: Timestamp('2019-08-16 00:00:00'),8: Timestamp('2019-11-30 00:00:00'),
9: Timestamp('2022-12-31 00:00:00'),10: Timestamp('2020-09-30 00:00:00'),
11: Timestamp('2021-02-28 00:00:00'),12: Timestamp('2022-10-31 00:00:00'),
13: Timestamp('2022-02-28 00:00:00'),14: Timestamp('2022-02-28 00:00:00'),
15: Timestamp('2022-02-28 00:00:00'),16: Timestamp('2021-06-30 00:00:00'),
17: Timestamp('2022-09-30 00:00:00'),18: Timestamp('2022-02-28 00:00:00'),
19: Timestamp('2022-03-07 00:00:00')}})
因此,如果我们考虑 2020 年:
year = 2020
after = df.index[df.start_date.dt.year >= year] # Started late in that year
before = df.index[df.end_date.dt.year <= year] # Left early in that year

df['after'] = df.iloc[after].start_date.dt.month
df['before'] = df.iloc[before].end_date.dt.month
df = df.fillna(0)
df['months'] = 12
df['months'] -= df['after']
df[df.before > 0]['months'] -= 12 - df['before']
df = df.drop(['before', 'after'], axis=1)

dm = df[(df.start_date.dt.year <= year) & (df.end_date.dt.year >= year)]
dm
我得到了 2020 年工作了 n 个月的 worker 名单:
    id      start_date  end_date    months
13 22737 2018-09-01 2021-12-31 12.0
16 3289 1999-11-01 2022-12-31 12.0
17 26312 2018-10-01 2020-09-30 12.0
18 28028 2019-01-01 2021-02-28 12.0
19 17017 2009-11-01 2022-10-31 12.0
20 27742 2019-09-01 2022-02-28 12.0
21 26884 2020-03-01 2022-02-28 9.0
22 31174 2020-03-01 2022-02-28 9.0
23 31889 2020-04-14 2021-06-30 8.0
24 33319 2020-10-01 2022-09-30 2.0
有没有更好的 Pandas 方法来实现相同的目标?
(随意重命名问题,我确定它的名字很糟糕)

最佳答案

这是一种方式:

start = '1/1/2020'
end = '12/31/2020'

s = (df['end_date'].clip(upper = pd.to_datetime(end)) -
df['start_date'].clip(lower = pd.to_datetime(start))).floordiv(pd.to_timedelta(30,'d'))
df = df.assign(months = s.where(s.gt(0)))

关于python - 那一年有人在 Pandas 呆了几个月?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67851398/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com