gpt4 book ai didi

python - Pandas 数据框 : add column that counts like-events in past

转载 作者:太空宇宙 更新时间:2023-11-04 03:55:28 25 4
gpt4 key购买 nike

我有一个谜题。这在excel中很容易。但是,在 Pandas 中,使用数据框 df:

   |  EventID  |  PictureID  |  Date
0 | 1 | A | 2010-01-01
1 | 2 | A | 2010-02-01
2 | 3 | A | 2010-02-15
3 | 4 | B | 2010-01-01
4 | 5 | C | 2010-02-01
5 | 6 | C | 2010-02-15

有没有办法添加一个新列来计算同一 PictureID 在过去 6 个月内记录事件的次数?换句话说,数据框中与给定行具有相同 PictureID 且日期在给定行日期之前六个月内的行数。

df['PastSix'] = ???

所以输出看起来像这样:

   |  EventID  |  PictureID  |  Date        |  PastSix
0 | 1 | A | 2010-01-01 | 0
1 | 2 | A | 2010-02-01 | 1
2 | 3 | A | 2010-02-15 | 2
3 | 4 | B | 2010-01-01 | 0
4 | 5 | C | 2010-02-01 | 0
5 | 6 | C | 2010-02-15 | 1

最佳答案

我不知道如何定义 6 个月,所以我用 prev 183 天代替,基本思路是使用 asof() 方法:

import pandas as pd
import numpy as np
import io

txt = u"""EventID | PictureID | Date
0 | A | 2009-07-01
1 | A | 2010-01-01
2 | A | 2010-02-01
3 | A | 2010-02-15
4 | B | 2010-01-01
5 | C | 2010-02-01
6 | C | 2010-02-15
7 | A | 2010-08-01
"""

df = pd.read_csv(io.StringIO(txt), sep=r"\s*\|\s*", parse_dates=["Date"])

def f(df):
count = pd.Series(np.arange(1, len(df)+1), index=df["Date"])
prev1day = count.index.shift(-1, freq="D")
prev6month = count.index.shift(-183, freq="D")
result = count.asof(prev1day).fillna(0).values - count.asof(prev6month).fillna(0).values
return pd.Series(result, df.index)

df["PastSix"] = df.groupby("PictureID").apply(f)
print df

输出:

   EventID PictureID                Date  PastSix
0 0 A 2009-07-01 00:00:00 0
1 1 A 2010-01-01 00:00:00 0
2 2 A 2010-02-01 00:00:00 1
3 3 A 2010-02-15 00:00:00 2
4 4 B 2010-01-01 00:00:00 0
5 5 C 2010-02-01 00:00:00 0
6 6 C 2010-02-15 00:00:00 1
7 7 A 2010-08-01 00:00:00 2

关于python - Pandas 数据框 : add column that counts like-events in past,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18807789/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com