gpt4 book ai didi

python - Pandas : how to apply functions per subgroups

转载 作者:行者123 更新时间:2023-12-01 02:27:15 24 4
gpt4 key购买 nike

我有一个简单的数据框,其中包含国籍、职业和年龄列。欧盟、美洲、亚洲的国籍热编码为 0、1、2。

对于每个职业,我想找到每个国籍的百分比例如:67% 的医生是欧洲人,33% 是亚洲人。

import pandas as pd
import numpy as np
#create dataframe
df=pd.DataFrame(np.concatenate((np.random.randint(low=0, high=3, size= (10,1)),np.random.randint(low=24, high=70, size=(10,1))),axis=1))
df.columns=['nationality','age']
df['occupation']=['teacher']*2+['engineer']*3+['doctor']*3+['lawyer']*2


nationality age occupation
0 0 65 teacher
1 0 31 teacher
2 0 30 engineer
3 2 63 engineer
4 0 28 engineer
5 1 27 doctor
6 0 52 doctor
7 0 60 doctor
8 0 33 lawyer
9 0 38 lawyer

df.groupby(['occupation','nationality']).count()

def iseuropean(x):
if x==0:
return 1
else:
return 0
def isamerican(x):
if x==1:
return 1
else:
return 0
def isasian(x):
if x==2:
return 1
else:
return 0

通过 groupby 我可以获得计数,但我想为每个职业组应用一个函数来确定百分比。不过我一直没能弄清楚。

任何帮助将不胜感激。

最佳答案

我假设您正在寻找每个职业的国籍百分比:

In [11]: c = df.groupby(['occupation','nationality'])["age"].count().rename("count")

In [12]: c
Out[12]:
occupation nationality
doctor 0 2
1 1
engineer 0 2
2 1
lawyer 0 2
teacher 0 2
Name: count, dtype: int64

In [13]: c / c.sum() # proportion of each, maybe not very useful
Out[13]:
occupation nationality
doctor 0 0.2
1 0.1
engineer 0 0.2
2 0.1
lawyer 0 0.2
teacher 0 0.2
Name: count, dtype: float64

In [14]: c / c.groupby(level=0).sum() # proportion of each occupation
Out[14]:
occupation nationality
doctor 0 0.666667
1 0.333333
engineer 0 0.666667
2 0.333333
lawyer 0 1.000000
teacher 0 1.000000
Name: count, dtype: float64
<小时/>

此外,您可能想使用分类代码而不是 is_XXX:

In [21]: pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])
Out[21]:
[european, european, european, asian, european, american, european, european, european, european]
Categories (3, object): [european, american, asian]

In [22]: df.nationality = pd.Categorical.from_codes(df.nationality, ["european", "american", "asian"])

In [23]: df
Out[23]:
nationality age occupation
0 european 65 teacher
1 european 31 teacher
2 european 30 engineer
3 asian 63 engineer
4 european 28 engineer
5 american 27 doctor
6 european 52 doctor
7 european 60 doctor
8 european 33 lawyer
9 european 38 lawyer

关于python - Pandas : how to apply functions per subgroups,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47251929/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com