gpt4 book ai didi

Headers to column, pandas DataFrame(列的标题,熊猫数据帧)

转载 作者:bug小助手 更新时间:2023-10-25 12:21:07 26 4
gpt4 key购买 nike



for example I have a pandas DataFrame of the test results in some class. It could look like this table:

例如,我有一个熊猫DataFrame的测试结果在某个班级。它可能看起来像下面的表格:
























































































Name English French History Math Physic Chemistry Biology
Mike 3 3 4 5 6 5 4
Tom 4 4 3 4 4 5 5
Nina 5 6 4 3 3 3 5
Anna 4 3 4 5 5 3 3
Musa 5 5 4 4 4 6 5
Maria 4 3 5 4 3 2 3
Chris 6 5 5 5 5 5 6


For every student I want to create at least two columns with the best test result and best subject. Important: every student can have more than only one best subject (the results are similar)!

我希望为每个学生创建至少两个列,其中包含最好的测试结果和最佳主题。重要提示:每个学生可以有不止一个最好的科目(结果相似)!


For the example above it should be look like this:

对于上面的示例,它应该如下所示:
















































































































Name English French History Math Physic Chemistry Biology Best result Best subject 1 Best subject 2
Mike 3 3 4 5 6 5 4 6 Physic None
Tom 4 4 3 4 4 5 5 5 Chemistry Biology
Nina 5 6 4 3 3 3 5 6 French None
Anna 4 3 4 5 5 3 3 5 Math Physic
Musa 5 5 4 4 4 6 5 6 Chemistry None
Maria 4 3 5 4 3 2 3 5 History None
Chris 6 5 5 5 5 5 6 6 English Biology


what is the best way to do it in Pandas? Thank you in advance!

在熊猫里做这件事最好的方法是什么?提前谢谢您!


what is the best way to do it in Pandas? Thank you in advance!

在熊猫里做这件事最好的方法是什么?提前谢谢您!


更多回答
优秀答案推荐

Another possible solution :

另一种可能的解决方案是:


tmp = df.set_index("Name") # a DataFrame
bre = tmp.max(axis=1) # a Series

bsu = (
((tmp.columns + "|") @ tmp.eq(bre, axis=0).T)
.str.strip("|").str.split("|", expand=True)
.rename(lambda x: f"Best subject {x+1}", axis=1)
)

out = tmp.assign(**{"Best result": bre}).join(bsu).reset_index()#.fillna("None")

Output :

输出:
























































































































Name English French History Math Physic Chemistry Biology Best result Best subject 1 Best subject 2
0 Mike 3 3 4 5 6 5 4 6 Physic
1 Tom 4 4 3 4 4 5 5 5 Chemistry Biology
2 Nina 5 6 4 3 3 3 5 6 French
3 Anna 4 3 4 5 5 3 3 5 Math Physic
4 Musa 5 5 4 4 4 6 5 6 Chemistry
5 Maria 4 3 5 4 3 2 3 5 History
6 Chris 6 5 5 5 5 5 6 6 English Biology


Try:

尝试:


best_result = df.select_dtypes(include="number").max(axis=1)
to_add = pd.DataFrame(
[
b.index[b == a]
for a, (_, b) in zip(best_result, df.select_dtypes(include="number").iterrows())
]
)
to_add.columns = [f"Best subject {c + 1}" for c in to_add]
df = pd.concat([df.assign(**{"Best result": best_result}), to_add], axis=1)

print(df)

Prints:

打印:


    Name  English  French  History  Math  Physic  Chemistry  Biology  Best result Best subject 1 Best subject 2
0 Mike 3 3 4 5 6 5 4 6 Physic None
1 Tom 4 4 3 4 4 5 5 5 Chemistry Biology
2 Nina 5 6 4 3 3 3 5 6 French None
3 Anna 4 3 4 5 5 3 3 5 Math Physic
4 Musa 5 5 4 4 4 6 5 6 Chemistry None
5 Maria 4 3 5 4 3 2 3 5 History None
6 Chris 6 5 5 5 5 5 6 6 English Biology


Calculating Best Result is a simpler part of the task - just apply max to each row (considering only subject columns).

计算最佳结果是任务的一个较简单的部分-只需将max应用于每一行(仅考虑主题列)。


To get first two best subjects you can create a separate function to apply to each row (now including all subject columns and best result column). In this function create a list best_subjects, iterate through all subjects and append only those, where score is best. One best subject is guaranteed, the second is not - so just in case appending None in the end. return pd.Series makes the result of the application a dataframe, that can be concatenated to the original one.

要获得前两个最佳主题,可以创建一个单独的函数来应用于每一行(现在包括所有主题列和最佳结果列)。在此函数中,创建一个BEST_SUBJECTS列表,遍历所有主题,只追加得分最高的主题。一个最好的主题是肯定的,第二个不是--以防最后什么都不加。返回pd.Series使应用程序的结果成为数据帧,可以连接到原始数据帧。


def get_best_subjects(row):
best_result = row["Best Result"]
best_subjects = []
for name, score in row.items():
if name != "Best Result" and score == best_result:
best_subjects.append(name)
best_subjects.append(None)
return pd.Series({
"Best Subject 1": best_subjects[0],
"Best Subject 2": best_subjects[1]
})

subjects = df.columns[1:]
df["Best Result"] = df[subjects].apply(max, axis=1)

pd.concat([
df,
df[list(subjects.values) + ["Best Result"]].apply(get_best_subjects, axis=1)
], axis=1)

Output:

产出:


    Name  English  French  History  Math  Physic  Chemistry  Biology  \
0 Mike 3 3 4 5 6 5 4
1 Tom 4 4 3 4 4 5 5
2 Nina 5 6 4 3 3 3 5
3 Anna 4 3 4 5 5 3 3
4 Musa 5 5 4 4 4 6 5
5 Maria 4 3 5 4 3 2 3
6 Chris 6 5 5 5 5 5 6

Best Result Best Subject 1 Best Subject 2
0 6 Physic None
1 5 Chemistry Biology
2 6 French None
3 5 Math Physic
4 6 Chemistry None
5 5 History None
6 6 English Biology

更多回答

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com