gpt4 book ai didi

python - 在值(value)条件下,有没有更快的方法来计算带有 pandas 的 groupby 对象的历史比率?

转载 作者:行者123 更新时间:2023-12-01 06:53:13 24 4
gpt4 key购买 nike

这里有两个玩家的数据帧示例以及预期输出的说明:

+--------+------------+-----------------------------------+
| Player | Result | Winning ratio (historical) |
+--------+------------+-----------------------------------+
| K2000 | Lose | 0% #first game so no hist |
| K2000 | Lose | 0% #0 game winned on 1 contested |
| K2000 | Win | 0% #0 game winned on 2 contested |
| K2000 | Not ranked | 33% #1 game winned on 3 contested |
| K2000 | Lose | 25% #and so on. |
| K2000 | Win | 20% |
| K2000 | Win | 33% |
| Kssis | Win | 0% |
| Kssis | Win | 100% |
| Kssis | Not ranked | 100% |
| Kssis | Lose | 66% |
| Kssis | Win | 50% |
+--------+------------+-----------------------------------+

为了获得它,我做了以下操作

df['sucess'] = df.apply(lambda row: 1 if row['result'] == 'Win' else 0, axis = 1)
df['nb_of_contests'] = df.apply(lambda row: 1 , axis = 1)
#gives
+--------+------------+--------+----------------+
| Player | Result | Sucess | Nb_of_contests |
+--------+------------+--------+----------------+
| K2000 | Lose | 0 | 1 |
| K2000 | Lose | 0 | 1 |
| K2000 | Win | 1 | 1 |
| K2000 | Not ranked | 0 | 1 |
| K2000 | Lose | 0 | 1 |
| K2000 | Win | 1 | 1 |
| K2000 | Win | 1 | 1 |
| Kssis | Win | 1 | 1 |
| Kssis | Win | 1 | 1 |
| Kssis | Not ranked | 0 | 1 |
| Kssis | Lose | 0 | 1 |
| Kssis | Win | 1 | 1 |
+--------+------------+--------+----------------+

#then the sums cumulated
cumul = df.groupby('Player')['sucess','nb_of_contests'].cumsum()
#cumul gives
+--------+------------+--------+----------------+
| Player | Result | Sucess | Nb_of_contests |
+--------+------------+--------+----------------+
| K2000 | Lose | 0 | 1 |
| K2000 | Lose | 0 | 2 |
| K2000 | Win | 1 | 3 |
| K2000 | Not ranked | 0 | 4 |
| K2000 | Lose | 0 | 5 |
| K2000 | Win | 2 | 6 |
| K2000 | Win | 3 | 7 |
| Kssis | Win | 1 | 1 |
| Kssis | Win | 2 | 2 |
| Kssis | Not ranked | 0 | 3 |
| Kssis | Lose | 0 | 4 |
| Kssis | Win | 3 | 5 |
+--------+------------+--------+----------------+

#then compute the ratio
winning_ratio = cumul['sucess']/cumul['nb_of_contests']
#finnaly shift
gb_winning_ratio = winning_ratio.groupby('Player') #in order to shift inside group, because cumul is a dataframe not a groupby object.
winning_ratio_shifted = gb_winning_ratio.shift(1)

那么,有没有更简单的方法呢?因为这里我认为这是可以简化的,但我没有足够的技能来改进它。因此,请毫不犹豫地给出深入的解释。我首先想掌握它。

Pandas 版本:0.23.4 Python 版本:3.7.4

最佳答案

通知:

避免:

ValueError: cannot reindex from a duplicate axis

创建默认RangeIndex:

df = df.reset_index(drop=True)

然后使用:

df['sucess'] = (df['Result'] == 'Win').astype(int)
df['nb_of_contests'] = 1

cumul = df.groupby('Player')['sucess','nb_of_contests'].cumsum()
winning_ratio = cumul['sucess'].div(cumul['nb_of_contests'])

winning_ratio_shifted = winning_ratio.groupby(df['Player']).shift().fillna(0)

print (winning_ratio_shifted)
0 0.000000
1 0.000000
2 0.000000
3 0.333333
4 0.250000
5 0.200000
6 0.333333
7 0.000000
8 1.000000
9 1.000000
10 0.666667
11 0.500000
dtype: float64

或者您可以使用 DataFrame.assign 的一行解决方案每组带有链 cumsumshift:

winning_ratio_shifted = (df.assign(sucess = (df['Result'] == 'Win').astype(int), 
nb_of_contests = 1)
.groupby('Player')['sucess','nb_of_contests']
.apply(lambda x: x.cumsum().shift())
.assign(new=lambda x: x['sucess'] / x['nb_of_contests'])['new']
.fillna(0)
)

print (winning_ratio_shifted)

1 0.000000
2 0.000000
3 0.333333
4 0.250000
5 0.200000
6 0.333333
7 0.000000
8 1.000000
9 1.000000
10 0.666667
11 0.500000
Name: new, dtype: float64

关于python - 在值(value)条件下,有没有更快的方法来计算带有 pandas 的 groupby 对象的历史比率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58913240/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com