gpt4 book ai didi

Python - Pandas - 分组数据框中所有列的 value_counts

转载 作者:行者123 更新时间:2023-12-01 03:01:07 25 4
gpt4 key购买 nike

我有一个针对所有问题的 7 分制调查数据集,我想获取所有列中常见值的 value_counts(并将数据框按两列分组)。让我向您展示一个示例数据集以及我到目前为止所达到的目标。

| col1          | col2          | col3          | Building      | Levels_Name            |
|---------------|---------------|---------------|---------------|------------------------|
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor |
| Satisfied | Satisfied | NA | Basingstoke | Individual Contributor |
| Not Satisfied | Satisfied | Not Satisfied | San Francisco | Middle Management |
| Not Satisfied | Satisfied | Not Satisfied | Miami | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Senior Leadership |
| NA | NA | NA | Foster City | Other |
| Not Satisfied | Not Satisfied | NA | Foster City | Senior Leadership |
| Not Satisfied | Satisfied | Not Satisfied | Austin | Middle Management |
| Satisfied | Satisfied | Satisfied | San Francisco | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Individual Contributor |
| Satisfied | Satisfied | NA | Miami | Middle Management |

现在,我想按“Building”和“Levels_Name”对此数据集进行分组,并为“Satisfied”、“Not Satisfied”、“NA”添加新分组,并获取每列的值计数。

因此结果应如下所示:

| Building      | Levels_Name            | Sentiment     | col1 | col2 | col3 |
|---------------|------------------------|---------------|------|------|------|
| Foster City | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| Foster City | Individual Contributor | NA | 0 | 0 | 0 |
| Foster City | Individual Contributor | Satisfied | 0 | 0 | 0 |
| Foster City | Senior Leadership | Not Satisfied | 2 | 2 | 0 |
| Foster City | Senior Leadership | NA | 0 | 0 | 1 |
| Foster City | Senior Leadership | Satisfied | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| San Francisco | Individual Contributor | NA | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Satisfied | 0 | 0 | 0 |

谢谢!

最佳答案

首先,您要融化数据框,然后进行分组

d1 = pd.melt(
df, ['Building', 'Levels_Name'], value_name='Sentiment'
).replace(np.nan, 'NaN')

d1.groupby(
d1.columns.tolist()
).size().unstack('variable', fill_value=0).reset_index()

variable Building Levels_Name Sentiment col1 col2 col3
0 Austin Middle Management Not Satisfied 1 0 1
1 Austin Middle Management Satisfied 0 1 0
2 Basingstoke Individual Contributor NaN 0 0 1
3 Basingstoke Individual Contributor Satisfied 1 1 0
4 Foster City Individual Contributor Not Satisfied 1 1 1
5 Foster City Other NaN 1 1 1
6 Foster City Senior Leadership NaN 0 0 1
7 Foster City Senior Leadership Not Satisfied 2 2 1
8 Miami Middle Management NaN 0 0 1
9 Miami Middle Management Satisfied 1 1 0
10 Miami Senior Leadership Not Satisfied 1 0 1
11 Miami Senior Leadership Satisfied 0 1 0
12 San Francisco Individual Contributor Not Satisfied 1 1 1
13 San Francisco Middle Management Not Satisfied 1 0 1
14 San Francisco Middle Management Satisfied 0 1 0
15 San Francisco Senior Leadership Satisfied 1 1 1

关于Python - Pandas - 分组数据框中所有列的 value_counts,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43829096/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com