gpt4 book ai didi

python - 分组列和计算

转载 作者:行者123 更新时间:2023-12-01 00:10:02 25 4
gpt4 key购买 nike

我的代码如下:

df.loc[df['Shape'].isin(Shapes), 'Shape'].value_counts().div(len(df)).to_frame().reset_index()

这给了我出现的次数,然后是该值所在的%,可以说整个数据帧的三角形。但是,如果我想添加另一列将其作为一个组进行分层,我将如何调整它?

当前代码给出了整个 df 中每个形状的百分比

Triangle .20
Square .40
Circle .40

我还希望它带有颜色,因此输出如下:

Triangle  Blue  .20
Triangle Red .40
Triangle Black .40
Square Blue .40
Square Red .30
Square Purple.30
...

谢谢

最佳答案

我认为你可以使用GroupBy.size具有多列:

np.random.seed(2020)
s = ['Triangle','Square','Circle', 'Rectangle']
c = ['Blue','Red','Black', 'Purple']

df = pd.DataFrame({'Shape':np.random.choice(s, size=20),
'Colors':np.random.choice(c, size=20)})
#print (df)

Shapes = ['Triangle','Square','Circle']

df1 = (df.loc[df['Shape'].isin(Shapes)]
.groupby(['Shape', 'Colors'])
.size()
.div(len(df))
.reset_index(name='per'))
print (df1)
Shape Colors per
0 Circle Black 0.10
1 Circle Red 0.05
2 Square Blue 0.05
3 Square Red 0.10
4 Triangle Black 0.05
5 Triangle Blue 0.05
6 Triangle Purple 0.10
7 Triangle Red 0.10

替代SeriesGroupBy.value_counts ,不同之处在于值按组排序:

df1 = (df.loc[df['Shape'].isin(Shapes)]
.groupby(['Shape'])['Colors']
.value_counts()
.div(len(df))
.reset_index(name='per'))
print (df1)
Shape Colors per
0 Circle Black 0.10
1 Circle Red 0.05
2 Square Red 0.10
3 Square Blue 0.05
4 Triangle Purple 0.10
5 Triangle Red 0.10
6 Triangle Black 0.05
7 Triangle Blue 0.05

如果想要每个组的百分比(每个组的总百分比为 1100%),则使用:

Shapes = ['Triangle','Square','Circle'] 

df2 = (df.loc[df['Shape'].isin(Shapes)]
.groupby(['Shape'])['Colors']
.value_counts(normalize=True)
.reset_index(name='per'))
print (df2)
Shape Colors per
0 Circle Black 0.666667
1 Circle Red 0.333333
2 Square Red 0.666667
3 Square Blue 0.333333
4 Triangle Purple 0.333333
5 Triangle Red 0.333333
6 Triangle Black 0.166667
7 Triangle Blue 0.166667

关于python - 分组列和计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59692577/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com