gpt4 book ai didi

python Pandas |仅从列的特定部分查找最大值

转载 作者:行者123 更新时间:2023-11-28 22:17:48 26 4
gpt4 key购买 nike

我一直在努力做到这一点。 Pandas max() 会找到整列中的最大值。我需要的是:

我的输入 csv 文件:

Id  Param1          Param2              Val1
1 -5.00138282776 2.04990620034e-08 1.738e-05
1 -4.80147838593 2.01516989762e-08 1.628e-05
1 -4.60159301758 1.98263165885e-08 1.671e-05
1 -4.40133094788 1.94918392538e-08 1.576e-05
1 -4.20143127441 1.91767686175e-08
2 -5.00141859055 6.88369405921e-09 5.512e-06
2 -4.80152130126 6.77335965093e-09 5.964e-06
2 -4.60163593292 6.65415056389e-09
3 -5.00138044357 1.16316911658e-08 4.008e-06
3 -4.80148792267 1.15515588206e-08 7.347e-06
3 -4.60160970681 1.14048361866e-08 8.446e-06
3 -4.40137386322 1.12357021465e-08

需要输出:

Id  Param1          Param2              Val1        Max_Val1_for_each_Id
1 -5.00138282776 2.04990620034e-08 1.738e-05 1.738e-05
1 -4.80147838593 2.01516989762e-08 1.628e-05
1 -4.60159301758 1.98263165885e-08 1.671e-05
1 -4.40133094788 1.94918392538e-08 1.576e-05
1 -4.20143127441 1.91767686175e-08
2 -5.00141859055 6.88369405921e-09 5.512e-06 5.964e-06
2 -4.80152130126 6.77335965093e-09 5.964e-06
2 -4.60163593292 6.65415056389e-09
3 -5.00138044357 1.16316911658e-08 4.008e-06 8.446e-06
3 -4.80148792267 1.15515588206e-08 7.347e-06
3 -4.60160970681 1.14048361866e-08 8.446e-06
3 -4.40137386322 1.12357021465e-08

我不确定如何从具有相同 Id 的 Val1 列中选择/分组值,然后找到它们的最大值。另外,我在 Val1 列中有一些空白,将其数据类型呈现为对象。我不知道该怎么办。非常欢迎任何帮助。

最佳答案

使用GroupBy.transform对于每组最大值的新列:

df['Max_Val1_for_each_Id'] = df.groupby('Id')['Val1'].transform('max')
print (df)
Id Param1 Param2 Val1 Max_Val1_for_each_Id
0 1 -5.001383 2.049906e-08 0.000017 0.000017
1 1 -4.801478 2.015170e-08 0.000016 0.000017
2 1 -4.601593 1.982632e-08 0.000017 0.000017
3 1 -4.401331 1.949184e-08 0.000016 0.000017
4 1 -4.201431 1.917677e-08 NaN 0.000017
5 2 -5.001419 6.883694e-09 0.000006 0.000006
6 2 -4.801521 6.773360e-09 0.000006 0.000006
7 2 -4.601636 6.654151e-09 NaN 0.000006
8 3 -5.001380 1.163169e-08 0.000004 0.000008
9 3 -4.801488 1.155156e-08 0.000007 0.000008
10 3 -4.601610 1.140484e-08 0.000008 0.000008
11 3 -4.401374 1.123570e-08 NaN 0.000008

然后,如果只需要第一个值,请添加 where带有 duplicated 创建的面具使用 ~ 表示反转掩码:

df['Max_Val1_for_each_Id'] = df['Max_Val1_for_each_Id'].where(~df['Id'].duplicated())
print (df)
Id Param1 Param2 Val1 Max_Val1_for_each_Id
0 1 -5.001383 2.049906e-08 0.000017 0.000017
1 1 -4.801478 2.015170e-08 0.000016 NaN
2 1 -4.601593 1.982632e-08 0.000017 NaN
3 1 -4.401331 1.949184e-08 0.000016 NaN
4 1 -4.201431 1.917677e-08 NaN NaN
5 2 -5.001419 6.883694e-09 0.000006 0.000006
6 2 -4.801521 6.773360e-09 0.000006 NaN
7 2 -4.601636 6.654151e-09 NaN NaN
8 3 -5.001380 1.163169e-08 0.000004 0.000008
9 3 -4.801488 1.155156e-08 0.000007 NaN
10 3 -4.601610 1.140484e-08 0.000008 NaN
11 3 -4.401374 1.123570e-08 NaN NaN

编辑:

如果 Val1 没有 NaN 值并且上述解决方案会引发错误:

TypeError: '>=' not supported between instances of 'float' and 'str'

第一步是将非数字转换为 NaNs:

df['Val1'] = pd.to_numeric(df['Val1'], errors='coerce')
df['Max_Val1_for_each_Id'] = df.groupby('Id')['Val1'].transform('max')
df['Max_Val1_for_each_Id'] = df['Max_Val1_for_each_Id'].where(~df['Id'].duplicated())

关于 python Pandas |仅从列的特定部分查找最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51054844/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com