gpt4 book ai didi

python - Groupby 和加权平均

转载 作者:太空宇宙 更新时间:2023-11-04 00:26:48 26 4
gpt4 key购买 nike

我有一个数据框:

import pandas as pd
import numpy as np

df=pd.DataFrame.from_items([('STAND_ID',[1,1,2,3,3,3]),('Species',['Conifer','Broadleaves','Conifer','Broadleaves','Conifer','Conifer']),
('Height',[20,19,13,24,25,18]),('Stems',[1500,2000,1000,1200,1700,1000]),('Volume',[200,100,300,50,100,10])])

STAND_ID Species Height Stems Volume
0 1 Conifer 20 1500 200
1 1 Broadleaves 19 2000 100
2 2 Conifer 13 1000 300
3 3 Broadleaves 24 1200 50
4 3 Conifer 25 1700 100
5 3 Conifer 18 1000 10

我想按 STAND_ID 和 Species 分组,对高度和茎应用加权平均值,以体积作为权重并取消堆叠。

所以我尝试:

newdf=df.groupby(['STAND_ID','Species']).agg({'Height':lambda x: np.average(x['Height'],weights=x['Volume']),
'Stems':lambda x: np.average(x['Stems'],weights=x['Volume'])}).unstack()

哪个给我错误:

builtins.KeyError: 'Height'

我该如何解决这个问题?

最佳答案

您的错误是因为您不能使用agg 进行多个系列/列操作。 Agg 以一个系列/列为时间。让我们使用 applypd.concat

g = df.groupby(['STAND_ID','Species'])
newdf = pd.concat([g.apply(lambda x: np.average(x['Height'],weights=x['Volume'])),
g.apply(lambda x: np.average(x['Stems'],weights=x['Volume']))],
axis=1, keys=['Height','Stems']).unstack()

编辑更好的解决方案:

g = df.groupby(['STAND_ID','Species'])
newdf = g.apply(lambda x: pd.Series([np.average(x['Height'], weights=x['Volume']),
np.average(x['Stems'],weights=x['Volume'])],
index=['Height','Stems'])).unstack()

输出:

              Height                  Stems             
Species Broadleaves Conifer Broadleaves Conifer
STAND_ID
1 19.0 20.000000 2000.0 1500.000000
2 NaN 13.000000 NaN 1000.000000
3 24.0 24.363636 1200.0 1636.363636

关于python - Groupby 和加权平均,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47184507/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com