for example I have a pandas DataFrame of the test results in some class. It could look like this table:
例如,我有一个熊猫DataFrame的测试结果在某个班级。它可能看起来像下面的表格:
Name |
English |
French |
History |
Math |
Physic |
Chemistry |
Biology |
Mike |
3 |
3 |
4 |
5 |
6 |
5 |
4 |
Tom |
4 |
4 |
3 |
4 |
4 |
5 |
5 |
Nina |
5 |
6 |
4 |
3 |
3 |
3 |
5 |
Anna |
4 |
3 |
4 |
5 |
5 |
3 |
3 |
Musa |
5 |
5 |
4 |
4 |
4 |
6 |
5 |
Maria |
4 |
3 |
5 |
4 |
3 |
2 |
3 |
Chris |
6 |
5 |
5 |
5 |
5 |
5 |
6 |
For every student I want to create at least two columns with the best test result and best subject. Important: every student can have more than only one best subject (the results are similar)!
我希望为每个学生创建至少两个列,其中包含最好的测试结果和最佳主题。重要提示:每个学生可以有不止一个最好的科目(结果相似)!
For the example above it should be look like this:
对于上面的示例,它应该如下所示:
Name |
English |
French |
History |
Math |
Physic |
Chemistry |
Biology |
Best result |
Best subject 1 |
Best subject 2 |
Mike |
3 |
3 |
4 |
5 |
6 |
5 |
4 |
6 |
Physic |
None |
Tom |
4 |
4 |
3 |
4 |
4 |
5 |
5 |
5 |
Chemistry |
Biology |
Nina |
5 |
6 |
4 |
3 |
3 |
3 |
5 |
6 |
French |
None |
Anna |
4 |
3 |
4 |
5 |
5 |
3 |
3 |
5 |
Math |
Physic |
Musa |
5 |
5 |
4 |
4 |
4 |
6 |
5 |
6 |
Chemistry |
None |
Maria |
4 |
3 |
5 |
4 |
3 |
2 |
3 |
5 |
History |
None |
Chris |
6 |
5 |
5 |
5 |
5 |
5 |
6 |
6 |
English |
Biology |
what is the best way to do it in Pandas? Thank you in advance!
在熊猫里做这件事最好的方法是什么?提前谢谢您!
what is the best way to do it in Pandas? Thank you in advance!
在熊猫里做这件事最好的方法是什么?提前谢谢您!
更多回答
优秀答案推荐
Another possible solution :
另一种可能的解决方案是:
tmp = df.set_index("Name") # a DataFrame
bre = tmp.max(axis=1) # a Series
bsu = (
((tmp.columns + "|") @ tmp.eq(bre, axis=0).T)
.str.strip("|").str.split("|", expand=True)
.rename(lambda x: f"Best subject {x+1}", axis=1)
)
out = tmp.assign(**{"Best result": bre}).join(bsu).reset_index()#.fillna("None")
Output :
输出:
|
Name |
English |
French |
History |
Math |
Physic |
Chemistry |
Biology |
Best result |
Best subject 1 |
Best subject 2 |
0 |
Mike |
3 |
3 |
4 |
5 |
6 |
5 |
4 |
6 |
Physic |
|
1 |
Tom |
4 |
4 |
3 |
4 |
4 |
5 |
5 |
5 |
Chemistry |
Biology |
2 |
Nina |
5 |
6 |
4 |
3 |
3 |
3 |
5 |
6 |
French |
|
3 |
Anna |
4 |
3 |
4 |
5 |
5 |
3 |
3 |
5 |
Math |
Physic |
4 |
Musa |
5 |
5 |
4 |
4 |
4 |
6 |
5 |
6 |
Chemistry |
|
5 |
Maria |
4 |
3 |
5 |
4 |
3 |
2 |
3 |
5 |
History |
|
6 |
Chris |
6 |
5 |
5 |
5 |
5 |
5 |
6 |
6 |
English |
Biology |
Try:
尝试:
best_result = df.select_dtypes(include="number").max(axis=1)
to_add = pd.DataFrame(
[
b.index[b == a]
for a, (_, b) in zip(best_result, df.select_dtypes(include="number").iterrows())
]
)
to_add.columns = [f"Best subject {c + 1}" for c in to_add]
df = pd.concat([df.assign(**{"Best result": best_result}), to_add], axis=1)
print(df)
Prints:
打印:
Name English French History Math Physic Chemistry Biology Best result Best subject 1 Best subject 2
0 Mike 3 3 4 5 6 5 4 6 Physic None
1 Tom 4 4 3 4 4 5 5 5 Chemistry Biology
2 Nina 5 6 4 3 3 3 5 6 French None
3 Anna 4 3 4 5 5 3 3 5 Math Physic
4 Musa 5 5 4 4 4 6 5 6 Chemistry None
5 Maria 4 3 5 4 3 2 3 5 History None
6 Chris 6 5 5 5 5 5 6 6 English Biology
Calculating Best Result
is a simpler part of the task - just apply max
to each row (considering only subject columns).
计算最佳结果是任务的一个较简单的部分-只需将max应用于每一行(仅考虑主题列)。
To get first two best subjects you can create a separate function to apply to each row (now including all subject columns and best result column). In this function create a list best_subjects
, iterate through all subjects and append only those, where score is best. One best subject is guaranteed, the second is not - so just in case appending None
in the end. return pd.Series
makes the result of the application a dataframe, that can be concatenated to the original one.
要获得前两个最佳主题,可以创建一个单独的函数来应用于每一行(现在包括所有主题列和最佳结果列)。在此函数中,创建一个BEST_SUBJECTS列表,遍历所有主题,只追加得分最高的主题。一个最好的主题是肯定的,第二个不是--以防最后什么都不加。返回pd.Series使应用程序的结果成为数据帧,可以连接到原始数据帧。
def get_best_subjects(row):
best_result = row["Best Result"]
best_subjects = []
for name, score in row.items():
if name != "Best Result" and score == best_result:
best_subjects.append(name)
best_subjects.append(None)
return pd.Series({
"Best Subject 1": best_subjects[0],
"Best Subject 2": best_subjects[1]
})
subjects = df.columns[1:]
df["Best Result"] = df[subjects].apply(max, axis=1)
pd.concat([
df,
df[list(subjects.values) + ["Best Result"]].apply(get_best_subjects, axis=1)
], axis=1)
Output:
产出:
Name English French History Math Physic Chemistry Biology \
0 Mike 3 3 4 5 6 5 4
1 Tom 4 4 3 4 4 5 5
2 Nina 5 6 4 3 3 3 5
3 Anna 4 3 4 5 5 3 3
4 Musa 5 5 4 4 4 6 5
5 Maria 4 3 5 4 3 2 3
6 Chris 6 5 5 5 5 5 6
Best Result Best Subject 1 Best Subject 2
0 6 Physic None
1 5 Chemistry Biology
2 6 French None
3 5 Math Physic
4 6 Chemistry None
5 5 History None
6 6 English Biology
更多回答
我是一名优秀的程序员,十分优秀!