gpt4 book ai didi

python - 根据 ID 列减去行 - Pandas

转载 作者:行者123 更新时间:2023-12-01 01:16:14 24 4
gpt4 key购买 nike

我有一个如下所示的数据框:

UserId    Date_watched    Days_not_watch
1 2010-09-11 5
1 2010-10-01 8
1 2010-10-28 1
2 2010-05-06 12
2 2010-05-18 5
3 2010-08-09 10
3 2010-09-25 5

我想知道编号。用户给出的天数作为间隙,因此我希望每个用户的每一行都有一列,并且我的数据框应如下所示:

UserId    Date_watched    Days_not_watch      Gap(2nd watch_date - 1st watch_date - days_not_watch)
1 2010-09-11 5 0 (First gap will be 0 for all users)
1 2010-10-01 8 15 (11th Sept+5=16th Sept; 1st Oct - 16th Sept=15days)
1 2010-10-28 1 9
2 2010-05-06 12 0
2 2010-05-18 5 0 (because 6th May+12 days=18th May)
3 2010-08-09 10 0
3 2010-09-25 4 36
3 2010-10-01 2 2

我在数据框的列名称旁边提到了计算间隙的公式。

最佳答案

这是一种使用groupby + shift的方法:

# sort by date first
df['Date_watched'] = pd.to_datetime(df['Date_watched'])
df = df.sort_values(['UserId', 'Date_watched'])

# calculate groupwise start dates, shifted
grp = df.groupby('UserId')
starts = grp['Date_watched'].shift() + \
pd.to_timedelta(grp['Days_not_watch'].shift(), unit='d')

# calculate timedelta gaps
df['Gap'] = (df['Date_watched'] - starts).fillna(pd.Timedelta(0))

# convert to days and then integers
df['Gap'] = (df['Gap'] / pd.Timedelta('1 day')).astype(int)

print(df)

UserId Date_watched Days_not_watch Gap
0 1 2010-09-11 5 0
1 1 2010-10-01 8 15
2 1 2010-10-28 1 19
3 2 2010-05-06 12 0
4 2 2010-05-18 5 0
5 3 2010-08-09 10 0
6 3 2010-09-25 5 37

关于python - 根据 ID 列减去行 - Pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54326170/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com