gpt4 book ai didi

python - 基于两个单独列中的日期范围的总和

转载 作者:太空宇宙 更新时间:2023-11-03 15:48:53 25 4
gpt4 key购买 nike

我想根据两列中的日期范围对一列中的所有值求和:

Start_Date  Value_to_sum  End_date
2017-12-13 2 2017-12-13
2017-12-13 3 2017-12-16
2017-12-14 4 2017-12-15
2017-12-15 2 2017-12-15

简单的 groupby 不会这样做,因为它只会添加特定日期的值。

我们可以做一个嵌入式 for 循环,但它会永远运行:

unique_date = carry.Start_Date.unique()
carry = pd.DataFrame({'Date':unique_date})
carry['total'] = 0
for n in tqdm(range(len(carry))):
tr = data.loc[data['Start_Date'] >= carry['Date'][n]]
for i in tr.index:
if carry['Date'][n] <= tr['End_date'][i]:
carry['total'][n] += tr['Value_to_sum'][i]

类似的东西会奏效,但就像我说的那样会花很长时间。

预期的输出是唯一的日期和每天的总数。

应该是

2017-12-13 = 5, 2017-12-14 = 7, 2017-12-15 = 9.

如何根据日期范围计算总和?

最佳答案

首先,按["Start_Date", "End_date"]分组,省去一些操作。

from collections import Counter
c = Counter()
df_g = df.groupby(["Start_Date", "End_date"]).sum().reset_index()

def my_counter(row):
s, v, e = row.Start_Date, row.Value_to_sum, row.End_date
if s == e:
c[pd.Timestamp(s, freq="D")] += row.Value_to_sum
else:
c.update({date: v for date in pd.date_range(s, e)})

df_g.apply(my_counter, axis=1)
print(c)
"""
Counter({Timestamp('2017-12-15 00:00:00', freq='D'): 9,
Timestamp('2017-12-14 00:00:00', freq='D'): 7,
Timestamp('2017-12-13 00:00:00', freq='D'): 5,
Timestamp('2017-12-16 00:00:00', freq='D'): 3})
"""

使用的工具:

Counter.update([iterable-or-mapping]): Elements are counted from an iterable or added-in from another mapping (or counter). Like dict.update() but adds counts instead of replacing them. Also, the iterable is expected to be a sequence of elements, not a sequence of (key, value) pairs. -- Cited from Python 3 Documentation

pandas.date_range

关于python - 基于两个单独列中的日期范围的总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48028234/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com