gpt4 book ai didi

python - Pandas 按一列的类别在所有表中出现唯一值的频率

转载 作者:行者123 更新时间:2023-12-05 03:23:32 25 4
gpt4 key购买 nike

有一列有4个类别,我想为每个唯一值显示其他列的值出现的频率部分数据示例

enter image description here

输出

enter image description here

最佳答案

从这里开始:

df = pd.DataFrame(
{
"cat1": ["yes", "no", "yes", "no", "yes"],
"cat2": ["a", "a", "b", "b", "a"],
"cat3": ["yes", "no", "no", "yes", "no"],
"quant": [1, 2, 3, 4, 5],
}
)

示例数据框:

    cat1 cat2 cat3  quant
0 yes a yes 1
1 no a no 2
2 yes b no 3
3 no b yes 4
4 yes a no 5

你可以这样做:

y = lambda x: x.value_counts(normalize=True).loc["yes"]
n = lambda x: x.value_counts(normalize=True).loc["no"]
df.groupby(["cat2"]).agg(
{
"cat1": [("yes", y), ("no", n)],
"cat3": [("yes", y), ("no", n)],
"quant": ["min", "max", "mean"],
}
)

结果:

      cat1                  cat3                   quant
yes no yes no min max mean
cat2
a 0.666667 0.333333 0.333333 0.666667 1 5 2.666667
b 0.500000 0.500000 0.500000 0.500000 3 4 3.500000

这是一个稍微更健壮的版本:

from functools import partial

def agg_func(s: pd.Series, name: str):
try:
return s.value_counts(normalize=True).loc[name]
except KeyError:
return 0


yes_no_agg = [
("yes", partial(agg_func, name="yes")),
("no", partial(agg_func, name="no")),
]

df.groupby(["cat2"]).agg(
{
"cat1": yes_no_agg,
"cat3": yes_no_agg,
"quant": ["min", "max", "mean"],
}
)

关于python - Pandas 按一列的类别在所有表中出现唯一值的频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72506083/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com