gpt4 book ai didi

python - Pandas 时间序列 : avg of a timestamp column

转载 作者:太空宇宙 更新时间:2023-11-03 14:44:32 27 4
gpt4 key购买 nike

我有一个数据框,看起来像这样:

ID      Date
16911 2017-04-15
16911 2017-04-25
16911 2017-04-27
16911 2017-05-08
16911 2017-05-20
16911 2017-05-25
16911 2017-08-08
16911 2017-08-11
16911 2017-08-24
16912 2017-04-15
16912 2017-04-25
16812 2017-04-27
16812 2017-05-08
16812 2017-05-20
16812 2017-05-25
16812 2017-08-08
16812 2017-08-11

日期已排序,我想找到时间戳之间的差异并找到每个 ID 的平均值。

还有,

假设 ID - 16911,我想要例如 -> 列表 a 的日期差异列表;

16911   2017-04-15
16911 2017-04-25
difference between the above two dates is 10, so a is
a = [10]

16911 2017-04-25
16911 2017-04-27
difference between the above two dates is 2, so a is
a=[10,2]

16911 2017-04-27
16911 2017-05-08
difference between the above two dates is 11(assuming), so a is
a=[10,2,11]

所以最终的输出应该是:

ID      Average_Day Diff
16911 3 days [10,2,11]

最佳答案

使用groupbydiff均值:

df = df.groupby('ID')['Date'].apply(lambda x: x.diff().mean()).reset_index()
print (df)
ID Date
0 16812 21 days 04:48:00
1 16911 16 days 09:00:00
2 16912 10 days 00:00:00

如果需要转换时间增量,例如到 :

df = df.groupby('ID')['Date'].apply(lambda x: x.diff().mean().days).reset_index()
print (df)
ID Date
0 16812 21
1 16911 16
2 16912 10

编辑:

#create difference column per ID
df['new'] = df.groupby('ID')['Date'].diff().dt.days
#remove NaT rows (first for each group)
df = df.dropna(subset=['new'])
#convert to integers
df['new'] = df['new'].astype(int)
#aggreagte lists and mean
df = df.groupby('ID', sort=False)['new'].agg([('val', lambda x: x.tolist()),('avg', 'mean')])
print (df)

ID
16911 [10, 2, 11, 12, 5, 75, 3, 13] 16.375
16912 [10] 10.000
16812 [11, 12, 5, 75, 3] 21.200

关于python - Pandas 时间序列 : avg of a timestamp column,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50795491/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com