gpt4 book ai didi

python - 以相同文本字符串开头的列的总和值

转载 作者:行者123 更新时间:2023-12-02 18:59:17 26 4
gpt4 key购买 nike

我想获取以相同文本字符串开头的列的值(按行)总和。下面是我原来的 df,类(class)失败了。

原始 df:

ID  P_English_2  P_English_3  P_German_1   P_Math_1  P_Math_3  P_Physics_2  P_Physics_4
56 1 3 1 2 0 0 3
11 0 0 0 1 4 1 0
6 0 0 0 0 0 1 0
43 1 2 1 0 0 1 1
14 0 1 0 0 1 0 0

所需的 df:

ID  P_English   P_German   P_Math   P_Physics
56 4 1 2 3
11 0 0 5 1
6 0 0 0 1
43 3 1 0 2
14 1 0 1 0

尝试过的代码:

import pandas as pd



df = pd.DataFrame({"ID": [56,11,6,43,14],

"P_Math_1": [2,1,0,0,0],

"P_English_3": [3,0,0,2,1],

 "P_English_2": [1,0,0,1,0],

"P_Math_3": [0,4,0,0,1],

"P_Physics_2": [0,1,1,1,0],

 "P_Physics_4": [3,0,0,1,0],

"P_German_1": [1,0,0,1,0]})


print(df)



categories = ['P_Math', 'P_English', 'P_Physics', 'P_German']


def correct_categories(cols):

 return [cat for col in cols for cat in categories if col.startswith(cat)]


result = df.groupby(correct_categories(df.columns),axis=1).sum()

print(result)

最佳答案

让我们尝试使用 axis=1 进行 groupby:

# extract the subjects
subjects = [x[0] for x in df.columns.str.rsplit('_',n=1)]

df.groupby(subjects, axis=1).sum()

输出:

   ID  P_English  P_German  P_Math  P_Physics
0 56 4 1 2 3
1 11 0 0 5 1
2 6 0 0 0 1
3 43 3 1 0 2
4 14 1 0 1 0

或者您可以使用wide_to_long,假设ID具有唯一值:

(pd.wide_to_long(df, stubnames=categories,
i=['ID'], j='count', sep='_')
.groupby('ID').sum()
)

输出:

    P_Math  P_English  P_Physics  P_German
ID
56 2.0 4.0 3.0 1.0
11 5.0 0.0 1.0 0.0
6 0.0 0.0 1.0 0.0
43 0.0 3.0 2.0 1.0
14 1.0 1.0 0.0 0.0

关于python - 以相同文本字符串开头的列的总和值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65795836/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com