gpt4 book ai didi

python - Pandas 中值的累积排名有关系

转载 作者:太空宇宙 更新时间:2023-11-03 13:13:11 25 4
gpt4 key购买 nike

我正在尝试找到一种方法来计算 Pandas 中的关系。

让我们从田径比赛中获取假设数据,其中有人员、比赛、预赛和时间。

每个人的位置是按照以下顺序:

对于给定的比赛/热量组合:

  • 时间最少的人放在第一位
  • 时间倒数第二的人

等等……

这将是相当简单的代码,但一方面..

如果两个人有相同的时间,他们都会得到相同的位置,然后下一次比他们的时间大的人将把该值 + 1 作为位置。

在下表中,对于 100 码短跑,预赛 1,RUNNER1 获得第一名,RUNNER2/RUNNER3 获得第二名,RUNNER3 获得冠军第三次(下一次 RUNNER2/RUNNER3)

所以基本上,逻辑如下:

如果 race <> race.shift() 或 heat <> heat.shift() 那么 place =1

如果 race = race.shift() and heat = heat.shift() and time>time.shift then place =place.shift()+1

如果 race = race.shift() and heat = heat.shift() and time>time.shift then place =place.shift()

让我感到困惑的部分是如何处理领带。否则我可以做类似的事情

df['Place']=np.where(
(df['race']==df['race'].shift())
&
(df['heat']==df['heat'].shift()),
df['Place'].shift()+1,
1)

谢谢!

示例数据如下:

Person,Race,Heat,Time
RUNNER1,100 Yard Dash,1,9.87
RUNNER2,100 Yard Dash,1,9.92
RUNNER3,100 Yard Dash,1,9.92
RUNNER4,100 Yard Dash,1,9.96
RUNNER5,100 Yard Dash,1,9.97
RUNNER6,100 Yard Dash,1,10.01
RUNNER7,100 Yard Dash,2,9.88
RUNNER8,100 Yard Dash,2,9.93
RUNNER9,100 Yard Dash,2,9.93
RUNNER10,100 Yard Dash,2,10.03
RUNNER11,100 Yard Dash,2,10.26
RUNNER7,200 Yard Dash,1,19.63
RUNNER8,200 Yard Dash,1,19.67
RUNNER9,200 Yard Dash,1,19.72
RUNNER10,200 Yard Dash,1,19.72
RUNNER11,200 Yard Dash,1,19.86
RUNNER12,200 Yard Dash,1,19.92

最后我想要的是

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4

[edit] 现在,更进一步..

假设一旦我留下一组唯一值,下次出现该组时,这些值将重置为 1..

因此,例如,- 请注意,它进入“热度 1”,然后是“热度 2”,然后回到“热度 1”- 我不希望排名从之前的“热度 1”继续,而是我希望他们重置。

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2

最佳答案

你可以使用:

grouped =  df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)

import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})

grouped = df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)

产量

    Heat    Person           Race   Time  Place  Rank
0 1 RUNNER1 100 Yard Dash 9.87 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 4.0 5.0
11 1 RUNNER7 200 Yard Dash 19.63 1.0 1.0
12 1 RUNNER8 200 Yard Dash 19.67 2.0 2.0
13 1 RUNNER9 200 Yard Dash 19.72 3.0 3.0
14 1 RUNNER10 200 Yard Dash 19.72 3.0 3.0
15 1 RUNNER11 200 Yard Dash 19.86 4.0 5.0
16 1 RUNNER12 200 Yard Dash 19.92 5.0 6.0

注意 Pandas 有一个 Groupby.rank可以计算许多常见排名形式的方法 - 但不是您描述的那种。请注意,例如在第 3 行中,在第二名和第三名选手平局之后,Rank 是 4,而 Place 是 3。


关于编辑:使用

(df['Heat'] != df['Heat'].shift()).cumsum()

消除歧义:

import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})

df['HeatGroup'] = (df['Heat'] != df['Heat'].shift()).cumsum()
grouped = df.groupby(['Race','HeatGroup'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)

产量

    Heat    Person           Race   Time  HeatGroup  Place  Rank
0 1 RUNNER1 100 Yard Dash 9.87 1 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 1 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 1 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 1 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 1 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 1 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 2 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 2 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 2 4.0 5.0
11 1 RUNNER7 100 Yard Dash 19.63 3 1.0 1.0
12 1 RUNNER8 100 Yard Dash 19.67 3 2.0 2.0
13 1 RUNNER9 100 Yard Dash 19.72 3 3.0 3.0
14 1 RUNNER10 100 Yard Dash 19.72 3 3.0 3.0
15 1 RUNNER11 100 Yard Dash 19.86 3 4.0 5.0
16 1 RUNNER12 100 Yard Dash 19.92 3 5.0 6.0

关于python - Pandas 中值的累积排名有关系,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38246058/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com