gpt4 book ai didi

python - Pandas 按某些列分组

转载 作者:行者123 更新时间:2023-12-01 01:07:22 25 4
gpt4 key购买 nike

描述

如何使用 Pandas groupby对某些列进行分组,而不对其他列进行分组?

当前进度

table_D = pd.DataFrame({
'Geo_ID': [1, 1, 1, 1, 2, 3, 4, 4, 5],
'A_Code': [12, 12, 12, 65, 65, 65, 65, 98, 98],
'A_Cost': [2, 9, 1, 10, 6, 7, 7, 6, 2],
}, columns=['Geo_ID', 'A_Code', 'A_Cost'])
table_D_dummies = pd.get_dummies(data = table_D, columns = ["A_Code"])
table_D_dummies_grouped = table_D_dummies.groupby(by = ["Geo_ID"]).sum()

问题

如下所示,这可以正确地按 Geo_ID 计算成本总和。不幸的是,它也是通过 A_Code 求和的。

A_Code_12、A_Code_65 和 A_Code_98 应单独组合。另外,在实际数据集中,A_Code 数量超过 100 个。

数据

表_D

+--------+--------+--------+
| Geo_ID | A_Code | A_Cost |
+--------+--------+--------+
| 1 | 12 | 2 |
| 1 | 12 | 9 |
| 1 | 12 | 1 |
| 1 | 65 | 10 |
| 2 | 65 | 6 |
| 3 | 65 | 7 |
| 4 | 65 | 7 |
| 4 | 98 | 6 |
| 5 | 98 | 2 |
+--------+--------+--------+

table_D_dummies

+---+--------+--------+-----------+-----------+-----------+
| | Geo_ID | A_Cost | A_Code_12 | A_Code_65 | A_Code_98 |
+---+--------+--------+-----------+-----------+-----------+
| 0 | 1 | 2 | 1 | 0 | 0 |
| 1 | 1 | 9 | 1 | 0 | 0 |
| 2 | 1 | 1 | 1 | 0 | 0 |
| 3 | 1 | 10 | 0 | 1 | 0 |
| 4 | 2 | 6 | 0 | 1 | 0 |
| 5 | 3 | 7 | 0 | 1 | 0 |
| 6 | 4 | 7 | 0 | 1 | 0 |
| 7 | 4 | 6 | 0 | 0 | 1 |
| 8 | 5 | 2 | 0 | 0 | 1 |
+---+--------+--------+-----------+-----------+-----------+

table_D_dummies_grouped

+--------+--------+-----------+-----------+-----------+
| Geo_ID | A_Cost | A_Code_12 | A_Code_65 | A_Code_98 |
+--------+--------+-----------+-----------+-----------+
| 1 | 22 | 3 | 1 | 0 |
| 2 | 6 | 0 | 1 | 0 |
| 3 | 7 | 0 | 1 | 0 |
| 4 | 13 | 0 | 1 | 1 |
| 5 | 2 | 0 | 0 | 1 |
+--------+--------+-----------+-----------+-----------+

最佳答案

您没有使用虚拟表,而是对原始数据框进行了分组:

table_D_dummies = pd.get_dummies(data = table_D, columns = ["A_Code"])
table_D_dummies_grouped = table_D.groupby(by = ["Geo_ID"]).sum()

您想要在此处对 table_D_dummies 进行分组:

>>> table_D_dummies
Geo_ID A_Cost A_Code_12 A_Code_65 A_Code_98
0 1 2 1 0 0
1 1 9 1 0 0
2 1 1 1 0 0
3 1 10 0 1 0
4 2 6 0 1 0
5 3 7 0 1 0
6 4 7 0 1 0
7 4 6 0 0 1
8 5 2 0 0 1
>>> table_D_dummies.groupby(by = ["Geo_ID"]).sum()
A_Cost A_Code_12 A_Code_65 A_Code_98
Geo_ID
1 22 3 1 0
2 6 0 1 0
3 7 0 1 0
4 13 0 1 1
5 2 0 0 1

如果您需要计算每个虚拟人的成本总和,请将其添加到分组列中:

>>> table_D_dummies.groupby(by = [
... "Geo_ID",
... *(c for c in table_D_dummies.columns if c.startswith('A_Code_'))
... ]).sum()
A_Cost
Geo_ID A_Code_12 A_Code_65 A_Code_98
1 0 1 0 10
1 0 0 12
2 0 1 0 6
3 0 1 0 7
4 0 0 1 6
1 0 7
5 0 0 1 2

关于python - Pandas 按某些列分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55208331/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com