python - 频率和百分比不均匀组 sns barplot-6ren

python - 频率和百分比不均匀组 sns barplot

转载作者：太空狗更新时间：2023-10-30 01:12:09

我正在尝试按组显示相对百分比以及 sns 条形图中的总频率。我比较的两组在大小上有很大不同，这就是为什么我在下面的函数中按组显示百分比。

这是我创建的示例数据框的语法，它在目标分类变量(“项目”)中具有与我的数据(“组”)相似的相对组大小。 “rand”只是我用来制作 df 的一个变量。

# import pandas and seaborn
import pandas as pd
import seaborn as sns
import numpy as np

# create dataframe
foobar = pd.DataFrame(np.random.randn(100, 3), columns=('groups', 'item', 'rand'))

# get relative groupsizes
for row, val in enumerate(foobar.rand) :
    if  val > -1.2 :
        foobar.loc[row, 'groups'] = 'A'
    else: 
        foobar.loc[row, 'groups'] = 'B'

    # assign categories that I am comparing graphically
    if row < 20:
        foobar.loc[row, 'item'] = 'Z'
    elif row < 40:
        foobar.loc[row, 'item'] = 'Y'
    elif row < 60:
        foobar.loc[row, 'item'] = 'X'
    elif row < 80:
        foobar.loc[row, 'item'] = 'W'
    else:
        foobar.loc[row, 'item'] = 'V'

这是我编写的按组比较相对频率的函数。它有一些默认变量，但我已经为这个问题重新分配了它们。

def percent_categorical(item, df=IA, grouper='Active Status') :
    # plot categorical responses to an item ('column name')
    # by percent by group ('diff column name w categorical data')
    # select a data frame (default is IA)
    # 'Active Status' is default grouper

    # create df of item grouped by status
    grouped = (df.groupby(grouper)[item]
    # convert to percentage by group rather than total count
                .value_counts(normalize=True)
                # rename column 
                .rename('percentage')
                # multiple by 100 for easier interpretation
                .mul(100)
                # change order from value to name
                .reset_index()
            .sort_values(item))

    # create plot
    PercPlot = sns.barplot(x=item,
                         y='percentage',
                         hue=grouper,
                         data=grouped,
                         palette='RdBu'
                         ).set_xticklabels(
                                 labels = grouped[item
                                      ].value_counts().index.tolist(), rotation=90)
    #show plot
    return PercPlot

函数和结果图如下:

percent_categorical('item', df=foobar, grouper='groups')

这很好，因为它允许我按组显示相对百分比。但是，我还想显示每个组的绝对数字，最好在图例中显示。在这种情况下，我希望它显示 A 组共有 89 名成员，B 组共有 11 名成员。

提前感谢您的帮助。

最佳答案

我通过拆分 groupby 操作解决了这个问题:一个用于获取百分比，一个用于计算对象的数量。

我调整了你的 percent_catergorical 函数如下:

def percent_categorical(item, df=IA, grouper='Active Status') :
    # plot categorical responses to an item ('column name')
    # by percent by group ('diff column name w categorical data')
    # select a data frame (default is IA)
    # 'Active Status' is default grouper

    # create groupby of item grouped by status
    groupbase = df.groupby(grouper)[item]
    # count the number of occurences
    groupcount = groupbase.count()       
    # convert to percentage by group rather than total count           
    groupper = (groupbase.value_counts(normalize=True)
                # rename column 
                .rename('percentage')
                # multiple by 100 for easier interpretation
                .mul(100)
                # change order from value to name
                .reset_index()
                .sort_values(item))

    # create plot
    fig, ax = plt.subplots()
    brplt = sns.barplot(x=item,
                         y='percentage',
                         hue=groupper,
                         data=groupper,
                         palette='RdBu',
                         ax=ax).set_xticklabels(
                                 labels = grouper[item
                                      ].value_counts().index.tolist(), rotation=90)
    # get the handles and the labels of the legend
    # these are the bars and the corresponding text in the legend
    thehandles, thelabels = ax.get_legend_handles_labels()
    # for each label, add the total number of occurences
    # you can get this from groupcount as the labels in the figure have
    # the same name as in the values in column of your df
    for counter, label in enumerate(thelabels):
        # the new label looks like this (dummy name and value)
        # 'XYZ (42)'
        thelabels[counter] = label + ' ({})'.format(groupcount[label])
    # add the new legend to the figure
    ax.legend(thehandles, thelabels)
    #show plot
    return fig, ax, brplt

得到你的数字:

fig, ax, brplt = percent_categorical('item', df=foobar, grouper='groups')

结果图如下所示:

您可以根据需要更改此图例的外观，我只是添加括号作为示例。

关于python - 频率和百分比不均匀组 sns barplot，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44763643/

文章推荐： c# - 尝试通用时出现转换问题/'type free' | ASP MVC

文章推荐： c# - 通过 WCF 回调发送业务对象时发生超时

文章推荐： c# - 减少大型单页 AJAX 应用程序(jQuery、ASP.net)

math - 生成圆内的随机点(均匀)
我需要在半径R的圆内生成一个均匀随机点。我意识到，通过在区间 [0 ... 2π) 中选择均匀随机的角度，并在区间 (0 ... R) 中选择均匀随机的半径，我最终会得到更多的点朝向中心，因为对于两
java - 在一个正方形内生成 N 个点(均匀)
我想在一个正方形内生成 N 个点(均匀地)。我怎样才能做到这一点？最佳答案非常酷的问题，比我想象的要困难得多，但这就是想法。有关于 n 边形的论文，但我只会做正方形。因此，圆的均匀分布是一个常见问
python - 如何使 itertools 组合 'increase' 均匀？
考虑以下示例: import itertools import numpy as np a = np.arange(0,5) b = np.arange(0,3) c = np.arange(0,7)
sql - 将一组值分成 5 组，每组应该有 sum(count) 均匀
SQL Server 将一组值分成 5 组，每组的 sum(count) 应该均匀分布。表仅包含 2 列 rid 和 count。 create table t1(rid int, count in
html - CSS:如何使 li 之间的 padding-right 均匀？
我有以下简单的 HTML。 A B C 和 CSS: ul { width: 100%; display: flex; flex-direction:

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 频率和百分比不均匀组 sns barplot