python - 如何使用 group by 获取唯一 ID 的累计和？-6ren

python - 如何使用 group by 获取唯一 ID 的累计和？

转载作者：太空宇宙更新时间：2023-11-04 04:18:09

24

4

我对 python 和 pandas 非常陌生，它正在处理一个看起来像的 pandas 数据框

Date     Time           ID   Weight
Jul-1     12:00         A       10
Jul-1     12:00         B       20
Jul-1     12:00         C       100
Jul-1     12:10         C       100
Jul-1     12:10         D       30
Jul-1     12:20         C       100
Jul-1     12:20         D       30
Jul-1     12:30         A       10
Jul-1     12:40         E       40
Jul-1     12:50         F       50
Jul-1     1:00          A       40

我正在尝试按日期、时间和 ID 实现分组并应用累积和，这样如果下一个时隙中存在 ID，则权重仅添加一次(唯一)。生成的数据框看起来像这样

Date     Time           Weight   
Jul-1     12:00         130     (10+20+100)
Jul-1     12:10         160     (10+20+100+30)
Jul-1     12:20         160     (10+20+100+30)
Jul-1     12:30         160     (10+20+100+30)
Jul-1     12:40         200     (10+20+100+30+40)
Jul-1     12:50         250     (10+20+100+30+40+50)
Jul-1     01:00         250     (10+20+100+30+40+50)

这是我在下面尝试的，但是这仍然多次计算权重:

df=df.groupby(['date','time','ID'])['Wt'].apply(lambda x: x.unique().sum()).reset_index()
df['cumWt']=df['Wt'].cumsum()

非常感谢任何帮助!

提前致谢!

最佳答案

下面的代码使用了pandas.duplicate() , pandas.merge() , pandas.groupby/sum和 pandas.cumsum()达到所需的输出:

# creates a series of weights to be considered and rename it to merge
unique_weights = df['weight'][~df.duplicated(['weight'])]
unique_weights.rename('consider_cum', inplace = True)

# merges the series to the original dataframe and replace the ignored values by 0
df = df.merge(unique_weights.to_frame(), how = 'left', left_index=True, right_index=True)
df.consider_cum = df.consider_cum.fillna(0)

# sums grouping by date and time
df = df.groupby(['date', 'time']).sum().reset_index()

# create the cumulative sum column and present the output
df['weight_cumsum'] = df['consider_cum'].cumsum()
df[['date', 'time', 'weight_cumsum']]

产生以下输出:

关于python - 如何使用 group by 获取唯一 ID 的累计和？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55051707/

24

4

0

文章推荐： linux - 与启动时运行的脚本结果不同

文章推荐： database - Eclipse无法连接数据库

sql - 与前几个月建立表格(累计)
我对以下需要使用 SQL 查询而不是 plsql 来解决的问题感到困惑。这个想法是建立一个累积列来计算之前的所有月份。输入表看起来像 Month 1 2 3 .. 24 我需要建立下表:
r - 在子组内使用单一、通用的特定于组的基线进行计算(累计)
我正在寻找一个整洁的解决方案，最好使用 tidyverse 这个问题符合this answer ，但它确实有一个额外的扭曲。我的数据有一个整体分组变量“grp”。在每个这样的组中，我想根据“试验”定义
sum - Spotfire 运行余额(累计)
我正在尝试在 Spotfire 中创建一个运行余额列，该列应该如下图所示。本质上，我想逐行计算“金额”列的累积总计，并且我希望它随着日期的变化从 0 开始。我尝试过几个 OVER 函数:Sum([A

首页

博学

6Ren·AI

商城

python - 如何使用 group by 获取唯一 ID 的累计和？