gpt4 book ai didi

python - 基于百分位数的类别分配

转载 作者:太空宇宙 更新时间:2023-11-04 09:55:34 24 4
gpt4 key购买 nike

我有以下数据框

Group Country GDP

A a ***
A b ***
B a ***
B b ***

我想通过创建一个新列,根据组内百分位排名将类别分配给 gdp(高,低)。这是我试过的

    def c(gr):
ser=gr['gdp']
p=np.nanpercentile(ser,50)
for i in ser:
if i>p:
return "high"
else:
return "low"

grouped = df.groupby('Group')
df['perf']=grouped.apply(c)

Perf 列正在返回 nan。我在这里做错了什么?

最佳答案

quantilenumpy.where 和自定义函数一起使用:

def c(gr):
ser=gr['gdp']
#q=0.5 is by default, so can be omit
p = ser.quantile()
gr['perf'] = np.where( ser > p, 'high', 'low')
return gr

df = df.groupby('Group').apply(c)

这可以通过 transform 简化:

q = df.groupby('Group')['gdp'].transform('quantile')
df['perf1'] = np.where(df['gdp'] > q, 'high', 'low')

示例:

np.random.seed(12)

N = 15
L = list('abcd')
df = pd.DataFrame({'Group': np.random.choice(L, N),
'gdp': np.random.rand(N)})
df = df.sort_values('Group').reset_index(drop=True)
df.loc[[0,4,5,10,13,14], 'gdp'] = np.nan
#print (df)

def c(gr):
ser=gr['gdp']
#q=0.5 is by default, so can be omit
p = ser.quantile()
gr['perf'] = np.where( ser > p, 'high', 'low')
return gr

df = df.groupby('Group').apply(c)

q = df.groupby('Group')['gdp'].transform('quantile')
df['perf1'] = np.where( df['gdp'] > q, 'high', 'low')
print (df)
Group gdp perf perf1
0 a NaN low low
1 a 0.907267 high high
2 a 0.456051 low low
3 b 0.675998 low low
4 b NaN low low
5 b NaN low low
6 b 0.563141 low low
7 b 0.801265 high high
8 c 0.372834 low low
9 c 0.481530 high high
10 c NaN low low
11 d 0.082524 low low
12 d 0.725954 high high
13 d NaN low low
14 d NaN low low

关于python - 基于百分位数的类别分配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46024750/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com