gpt4 book ai didi

python - Pandas groupby winsorized 均值

转载 作者:太空宇宙 更新时间:2023-11-03 23:51:27 26 4
gpt4 key购买 nike

正常的 groupby 均值很简单:

df.groupby(['col_a','col_b']).mean()[col_i_want]

但是,如果我想应用一个 winsorized 均值(默认限制为 0.05 和 0.95),这相当于裁剪数据集然后执行均值,突然间似乎没有简单的方法可以做到这一点?我必须:

winsorized_mean = []
col_i_want = 'col_c'
for entry in df['col_a'].unique():
for entry2 in df['col_b'].unique():
sub_df = df[(df['col_a'] == entry) & (df['col_b'] == entry2)]
m = sub_df[col_to_groupby].clip(lower=0.05,upper=0.95).mean()
winsorized_mean.append([entry,entry2,m])

是否有我不知道的自动执行此操作的功能?

最佳答案

您可以使用 scipy.stats.trim_mean :

import pandas as pd
from scipy.stats import trim_mean

# label 'a' will exhibit different means depending on trimming
label = ['a'] * 20 + ['b'] * 80 + ['c'] * 400 + ['a'] * 100

data = list(range(100)) + list(range(500, 1000))

df = pd.DataFrame({'label': label, 'data': data})

grouped = df.groupby('label')

# trim 5% off both ends
print(grouped.apply(stats.trim_mean, .05))

# trim 10% off both ends
print(grouped.apply(stats.trim_mean, .1))

关于python - Pandas groupby winsorized 均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59241970/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com