gpt4 book ai didi

python - 循环分组依据、聚合并根据组创建新列

转载 作者:行者123 更新时间:2023-12-03 07:56:55 28 4
gpt4 key购买 nike

我正在尝试获取学生专栏,根据他们的进度水平来统计事件。

数据看起来像 enter image description here

STUDENT_ID STUDENT_ACTIVITY_SESSION_ID NODE_NAME   ACTIVITY_NAME   prog_levelFredID  gobbledeegook1  Node1   MyActivity1 passFredID  gobbledeegook2  Node1   MyActivity1 passFredID  gobbledeegook3  Node2   MyActivity2 passJaniceID    gobbledeegook4  Node3   MyActivity3 stayJaniceID    gobbledeegook5  Node3   MyActivity3 stayJaniceID    gobbledeegook5  Node3   MyActivity3 fail

Here is what I want:enter image description here

STUDENT_ID attempts_pass   attempts_fail   attempts_stayFredID  3       JaniceID        1   2
  1. I attempt to loop through so variable names are automatic. I want each row to be a STUDENT_ID, and the count to be a column
def std_attempts_by_prog_level(df):
dict_fields = {}
df_by_prog_level = df.groupby('prog_level')['STUDENT_ACTIVITY_SESSION_ID']
for name, group in df_by_prog_level:
x = group.count()
dict_fields["attempts_" + name] = x

return pd.Series(dict_fields)

df.groupby('STUDENT_ID').apply(std_attempts_by_prog_level).reset_index()

结果:

STUDENT_ID level_1 00   Fred    attempts_cancel 1041   Fred    attempts_fail   962   Fred    attempts_in_progress    30

...so this would need to be pivoted and messed with, so I tried just taking it from a pivot approach

  1. Pivot approach and naming the fields manually: the resulting multi-index won't let me readily merge back with the other by-student metrics
df_temp=df.groupby(['STUDENT_ID', 'prog_level'],as_index=False)['STUDENT_ACTIVITY_SESSION_ID'].count().pivot(index='STUDENT_ID', columns='prog_level').rename({'cancel':'attempts_cancel', 'fail':'attempts_fail', 'in_progress':'attempts_in_progress', 'pass':'attempts_pass'}, axis=1)

print(df_temp.columns)

结果:

MultiIndex([('STUDENT_ACTIVITY_SESSION_ID',      'attempts_cancel'),
('STUDENT_ACTIVITY_SESSION_ID', 'attempts_fail'),
('STUDENT_ACTIVITY_SESSION_ID', 'attempts_in_progress'),
('STUDENT_ACTIVITY_SESSION_ID', 'attempts_pass')],
names=[None, 'prog_level'])

最佳答案

您可以使用.pivot_table :

result = df.pivot_table(
index="STUDENT_ID", columns="prog_level", values="ACTIVITY_NAME",
aggfunc="count", fill_value=0
).rename(lambda c: f"prog_level_{c}", axis=1).rename_axis(None, axis=1)

结果:

            prog_level_fail  prog_level_pass  prog_level_stay
STUDENT_ID
FredID 0 3 0
JaniceID 1 0 2

如果您希望索引作为列,请在管道末尾添加 .reset_index()

关于python - 循环分组依据、聚合并根据组创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75844896/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com