gpt4 book ai didi

python-3.x - 根据其他两列的分组值获取均值列

转载 作者:行者123 更新时间:2023-12-04 00:47:07 24 4
gpt4 key购买 nike

我有一些学校数据:

data = {'name': ['school a', 'school b', 'school c', 'school d', 'school e', 'school f'], 
'type': ['a', 'a', 'b', 'b', 'a', 'b'],
'location': ['county a', 'county a', 'county b', 'county b', 'county b', 'county a'],
'avg_score': [9, 7, 5, 7, 6, 8]
}

df = pd.DataFrame(data)


Out:
name type location avg_score
0 school a a county a 9
1 school b a county a 7
2 school c b county b 5
3 school d b county b 7
4 school e a county b 6
5 school f b county a 8

我想将学校分数与每个地点的学校类型的平均值进行比较。

我可以用 groupby 做到这一点: df.groupby(['type', 'location']).mean().round(2)
Out: 

avg_score
type location
a county a 8
county b 6
b county a 8
county b 6

但是,我想获得一个附加列,其中包含每个位置的这种学校类型的平均值,而不是分组表。

我如何获得这样的 compare_score:
    name       type location avg_score compare_score
0 school a a county a 9 8
1 school b a county a 7 5
2 school c b county b 5 7
3 school d b county b 7 7
4 school e a county b 6 3
5 school f b county a 8 7

我发现了这个问题

Python Pandas average based on condition into new column

并尝试将一些可能的解决方案应用于我的问题:
for atype, alocation in df.groupby('type'):
df.loc[df.type == type, 'compare'] = (df.where(df['type' == atype]).where(df['location' == alocation]).mean()).avg_score.round(2)```

引发此错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/share/virtualenvs/schule-jwiURUl3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
<ipython-input-27-4b3cf2b7aaf6> in <module>
1 for atype, alocation in df.groupby('type'):
----> 2 df.loc[df.type == type, 'compare'] = (df.where(df['type' == atype]).where(df['location' == alocation]).mean()).avg_score.round(2)

~/.local/share/virtualenvs/schule-jwiURUl3/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2925 if self.columns.nlevels > 1:
2926 return self._getitem_multilevel(key)
-> 2927 indexer = self.columns.get_loc(key)
2928 if is_integer(indexer):
2929 indexer = [indexer]

~/.local/share/virtualenvs/schule-jwiURUl3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

也许这根本就不是一个好的尝试。你有什么建议吗?
任何提示都受到高度赞赏。

最佳答案

我认为您正在寻找 transform :

df['compare_score']=df.groupby(['type', 'location'])['avg_score'].transform('mean').round(2)
print(df)

---------------

name type location avg_score compare_score
0 school a a county a 9 8
1 school b a county a 7 8
2 school c b county b 5 6
3 school d b county b 7 6
4 school e a county b 6 6
5 school f b county a 8 8

关于python-3.x - 根据其他两列的分组值获取均值列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58332231/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com