gpt4 book ai didi

python - 如果 pandas 中两个日期之间的日期,则查找每个 ID 的值总和

转载 作者:行者123 更新时间:2023-12-01 00:57:19 26 4
gpt4 key购买 nike

data={'id':[1,1,1,1,2,2,2,2],
'date1':[datetime.date(2016,1,1),datetime.date(2016,1,2),datetime.date(2016,1,2),datetime.date(2016,1,4),
datetime.date(2016,1,2),datetime.date(2016,1,4),datetime.date(2016,1,3),datetime.date(2016,1,1)],
'date2':[datetime.date(2016,1,5),datetime.date(2016,1,3),datetime.date(2016,1,5),datetime.date(2016,1,5),
datetime.date(2016,1,4),datetime.date(2016,1,5),datetime.date(2016,1,4),datetime.date(2016,1,1)],
'score1':[5,7,3,2,9,3,8,3],
'score2':[1,3,0,5,2,20,7,7]}
df=pd.DataFrame.from_dict(data)

数据框 df 看起来像这样:

   id       date1       date2  score1  score2
0 1 2016-01-01 2016-01-05 5 1
1 1 2016-01-02 2016-01-03 7 3
2 1 2016-01-02 2016-01-05 3 0
3 1 2016-01-04 2016-01-05 2 5
4 2 2016-01-02 2016-01-04 9 2
5 2 2016-01-04 2016-01-05 3 20
6 2 2016-01-03 2016-01-04 8 7
7 2 2016-01-01 2016-01-01 3 7

另一个包含 ID、usetdate 的数据帧 UF

   id      usetdate  
0 1 2016-01-01
1 1 2016-01-03
2 2 2016-01-04
3 2 2016-01-02

如果我传递的日期(usetdate)在 date1 和 date2 之间,我想根据 UF 的每个 ID 找到 Score1 和 Score2 的总和

   id      usetdate  score1  score2
0 1 2016-01-01 5 1
1 1 2016-01-03 17 9
2 2 2016-01-04 20 29
3 2 2016-01-02 9 2

最佳答案

您可以先通过 date_range 创建包含所有日期时间的 Series ,将索引与 SeriesDataFrame.join 中的值交换原始的、最后的聚合总和:

s = pd.concat([pd.Series(r.Index,pd.date_range(r.date1, r.date2)) for r in df.itertuples()])
s = pd.Series(s.index, index=s, name='usetdate')

df = df.drop(['date1','date2'],axis=1).join(s).groupby(['id','usetdate'], as_index=False).sum()
print (df)
id usetdate score1 score2
0 1 2016-01-01 5 1
1 1 2016-01-02 15 4
2 1 2016-01-03 15 4
3 1 2016-01-04 10 6
4 1 2016-01-05 10 6
5 2 2016-01-01 3 7
6 2 2016-01-02 9 2
7 2 2016-01-03 17 9
8 2 2016-01-04 20 29
9 2 2016-01-05 3 20

编辑:

L = [(i, d, s1, s2) for i, d1, d2, s1, s2 in df.values for d in pd.date_range(d1, d2)]
df = (pd.DataFrame(L, columns=['id','usetdate','score1','score2'])
.groupby(['id','usetdate'], as_index=False).sum())
print (df)
id usetdate score1 score2
0 1 2016-01-01 5 1
1 1 2016-01-02 15 4
2 1 2016-01-03 15 4
3 1 2016-01-04 10 6
4 1 2016-01-05 10 6
5 2 2016-01-01 3 7
6 2 2016-01-02 9 2
7 2 2016-01-03 17 9
8 2 2016-01-04 20 29
9 2 2016-01-05 3 20

编辑:

您可以在聚合之前使用左连接合并值:

df1['userdate'] = pd.to_datetime(df1['userdate'])
print (df1)
id userdate
0 1 2016-01-01
1 1 2016-01-03
2 2 2016-01-04
3 2 2016-01-02

L = [(i, d, s1, s2) for i, d1, d2, s1, s2 in df.values for d in pd.date_range(d1, d2)]
df = (pd.DataFrame(L, columns=['id','userdate','score1','score2'])
.merge(df1)
.groupby(['id','userdate'], as_index=False)
.sum())
print (df)
id userdate score1 score2
0 1 2016-01-01 5 1
1 1 2016-01-03 15 4
2 2 2016-01-02 9 2
3 2 2016-01-04 20 29

编辑1:

您可以过滤列表理解中转换为元组的值:

df1['userdate'] = pd.to_datetime(df1['userdate'])
print (df1)
id userdate
0 1 2016-01-01
1 1 2016-01-03
2 2 2016-01-04
3 2 2016-01-02

a = [tuple(x) for x in df1.values]
print (a)
[(1, Timestamp('2016-01-01 00:00:00')), (1, Timestamp('2016-01-03 00:00:00')),
(2, Timestamp('2016-01-04 00:00:00')), (2, Timestamp('2016-01-02 00:00:00'))]
L = [(i, d, s1, s2) for i, d1, d2, s1, s2 in df.values
for d in pd.date_range(d1, d2)
if (i, d) in a]
df = (pd.DataFrame(L, columns=['id','userdate','score1','score2'])
.groupby(['id','userdate'], as_index=False)
.sum())
print (df)
id userdate score1 score2
0 1 2016-01-01 5 1
1 1 2016-01-03 15 4
2 2 2016-01-02 9 2
3 2 2016-01-04 20 29

关于python - 如果 pandas 中两个日期之间的日期,则查找每个 ID 的值总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56143547/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com