gpt4 book ai didi

python - Pandas pd.Grouper 和每组的顺序日期差异

转载 作者:行者123 更新时间:2023-12-01 09:16:53 26 4
gpt4 key购买 nike

我有一个像这样的示例数据框:

import pandas as pd
df = pd.DataFrame({"id": [0]*5 + [1]*5,
"time": ['2015-01-01', '2015-01-03', '2015-01-04', '2015-01-08', '2015-01-10', '2015-02-02', '2015-02-04', '2015-02-06', '2015-02-11', '2015-02-13'],
'hit': [0,3,8,2,5, 6,12,0,7,3]})
df.time = df.time.astype('datetime64[ns]')
df = df[['id', 'time', 'hit']]
df

将输出:

    id  time        hit
0 0 2015-01-01 0
1 0 2015-01-03 3
2 0 2015-01-04 8
3 0 2015-01-08 2
4 0 2015-01-10 5
5 1 2015-02-02 6
6 1 2015-02-04 12
7 1 2015-02-06 0
8 1 2015-02-11 7
9 1 2015-02-13 3

然后我对时间(每天)进行了groupby:

df.groupby(['id', pd.Grouper(key='time', freq='1D')]).hit.sum().to_frame()

结果:

               hit
id time
0 2015-01-01 0
2015-01-03 3
2015-01-04 8
2015-01-08 2
2015-01-10 5
1 2015-02-02 6
2015-02-04 12
2015-02-06 0
2015-02-11 7
2015-02-13 3

但是,即使值 = 0,我也想保留每日点击量,并计算每个 id 自第一天以来的每日点击量。我想要的输出:

               hit  day_since
id time
0 2015-01-01 0 1
2015-01-02 0 2
2015-01-03 3 3
2015-01-04 8 4
2015-01-05 0 5
2015-01-06 0 6
2015-01-07 0 7
1 2015-02-02 6 1
2015-02-03 0 2
2015-02-04 12 3
2015-02-05 0 4
2015-02-06 0 5
2015-02-07 0 6
2015-02-08 0 7

cumcount 不起作用,因为它按组对每个项目进行编号。但就我而言,我希望计算每组的连续日期差异。

有人有什么想法吗?

最佳答案

groupby之后,

df = df.reset_index(level=0)

# container for resulting dataframe
dfs = pd.DataFrame()

for i in df.id.unique():
# prepare a series and upsample it within the same id
chunk = pd.Series(df.loc[df.id == i, 'hit'])
chunk = chunk.resample('1D').asfreq()

# create dataframe and construct some additional columns
chunk = pd.DataFrame(chunk, columns=['hit']).reset_index().fillna(0)
chunk['hit'] = chunk['hit'].astype(int)
chunk['id'] = i
chunk['day_since'] = chunk.groupby('id').cumcount() + 1

# accumulate id-wise dataframes 1 by 1 vertically
dfs = pd.concat([dfs, chunk], axis=0, ignore_index=True)

dfs = dfs.set_index(['id', 'time'])

你会得到:

               hit  day_since
id time
0 2015-01-01 0 1
2015-01-02 0 2
2015-01-03 3 3
2015-01-04 8 4
2015-01-05 0 5
2015-01-06 0 6
2015-01-07 0 7
2015-01-08 2 8
2015-01-09 0 9
2015-01-10 5 10
1 2015-02-02 6 1
2015-02-03 0 2
2015-02-04 12 3
2015-02-05 0 4
2015-02-06 0 5
2015-02-07 0 6
2015-02-08 0 7
2015-02-09 0 8
2015-02-10 0 9
2015-02-11 7 10
2015-02-12 0 11
2015-02-13 3 12

关于python - Pandas pd.Grouper 和每组的顺序日期差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51171892/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com