gpt4 book ai didi

python - 带有 Pandas pivot_table 的嵌套小计 'All' 行

转载 作者:太空宇宙 更新时间:2023-11-03 15:34:56 25 4
gpt4 key购买 nike

我有一些看起来像这样的长格式数据(请参阅下文重新创建):

>>> df
section subsection name topic score
0 A W zwphf a 0.802427
1 A W jcyyc a 0.404077
2 A W kucem a 0.367319
3 A X ldbxz a 0.554260
4 A X vkcqh a 0.265864
5 A X cvksn a 0.548099
6 B Y spghx a 0.472612
7 B Y cqokn a 0.577504
8 B Y wjsxg a 0.815309
9 B Z holoo a 0.459850
10 B Z lnihf a 0.667877
11 B Z wirhq a 0.138879
12 A W zwphf b 0.673711
13 A W jcyyc b 0.507962
14 A W kucem b 0.546055
15 A X ldbxz b 0.148214
16 A X vkcqh b 0.773320
17 A X cvksn b 0.791990
18 B Y spghx b 0.487480
19 B Y cqokn b 0.252534
20 B Y wjsxg b 0.237767
21 B Z holoo b 0.432981
22 B Z lnihf b 0.317932
23 B Z wirhq b 0.614401

我想在 section + subsection + name + topic 加上 unstack 在 topic,但也显示断断续续的嵌套“全部”小计行:

>>> result                                                                                                                                         
section subsection name a b
0 A All All 0.490341 0.573542
1 A W All 0.524608 0.575909
2 A W jcyyc 0.404077 0.507962
3 A W kucem 0.367319 0.546055
4 A W zwphf 0.802427 0.673711
5 A X All 0.456074 0.571174
6 A X cvksn 0.548099 0.791990
7 A X ldbxz 0.554260 0.148214
8 A X vkcqh 0.265864 0.773320
9 B All All 0.522005 0.390516
10 B Y All 0.621808 0.325927
11 B Y cqokn 0.577504 0.252534
12 B Y spghx 0.472612 0.487480
13 B Y wjsxg 0.815309 0.237767
14 B Z All 0.422202 0.455104
15 B Z holoo 0.459850 0.432981
16 B Z lnihf 0.667877 0.317932
17 B Z wirhq 0.138879 0.614401

通过突出显示新行,这可能更容易可视化:

enter image description here

最初的 groupby 本身,没有小计,看起来像:

>>> df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
topic a b
section subsection name
A W jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
zwphf 0.802427 0.673711
X cvksn 0.548099 0.791990
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
B Y cqokn 0.577504 0.252534
spghx 0.472612 0.487480
wjsxg 0.815309 0.237767
Z holoo 0.459850 0.432981
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401

但我不确定如何使用 margins['section', 'topic'][' 上的 groupby 操作获取小计节”、“小节”、“主题”]


重新创建 df:

import pandas as pd
data = [['A', 'W', 'zwphf', 'a', 0.80242702],
['A', 'W', 'jcyyc', 'a', 0.40407741],
['A', 'W', 'kucem', 'a', 0.36731944],
['A', 'X', 'ldbxz', 'a', 0.55426007],
['A', 'X', 'vkcqh', 'a', 0.26586396],
['A', 'X', 'cvksn', 'a', 0.54809939],
['B', 'Y', 'spghx', 'a', 0.47261223],
['B', 'Y', 'cqokn', 'a', 0.57750357],
['B', 'Y', 'wjsxg', 'a', 0.81530899],
['B', 'Z', 'holoo', 'a', 0.45985020],
['B', 'Z', 'lnihf', 'a', 0.66787651],
['B', 'Z', 'wirhq', 'a', 0.13887864],
['A', 'W', 'zwphf', 'b', 0.67371101],
['A', 'W', 'jcyyc', 'b', 0.50796174],
['A', 'W', 'kucem', 'b', 0.54605544],
['A', 'X', 'ldbxz', 'b', 0.14821402],
['A', 'X', 'vkcqh', 'b', 0.77331968],
['A', 'X', 'cvksn', 'b', 0.79198960],
['B', 'Y', 'spghx', 'b', 0.48747995],
['B', 'Y', 'cqokn', 'b', 0.25253355],
['B', 'Y', 'wjsxg', 'b', 0.23776694],
['B', 'Z', 'holoo', 'b', 0.43298050],
['B', 'Z', 'lnihf', 'b', 0.31793156],
['B', 'Z', 'wirhq', 'b', 0.61440056]]
df = pd.DataFrame(data,
columns=['section', 'subsection', 'name', 'topic', 'score'])

重新创建预期结果:

import numpy as np

result = np.array([['A', 'All', 'All', 0.490341219, 0.573541919],
['A', 'W', 'All', 0.52460796, 0.5759094],
['A', 'W', 'jcyyc', 0.404077415, 0.5079617479999999],
['A', 'W', 'kucem', 0.36731944, 0.546055442],
['A', 'W', 'zwphf', 0.8024270240000001, 0.673711011],
['A', 'X', 'All', 0.45607447700000003, 0.571174437],
['A', 'X', 'cvksn', 0.548099391, 0.791989603],
['A', 'X', 'ldbxz', 0.554260074, 0.148214029],
['A', 'X', 'vkcqh', 0.265863967, 0.77331968],
['B', 'All', 'All', 0.5220050279999999, 0.390515513],
['B', 'Y', 'All', 0.621808268, 0.325926816],
['B', 'Y', 'cqokn', 0.577503576, 0.252533557],
['B', 'Y', 'spghx', 0.472612233, 0.487479951],
['B', 'Y', 'wjsxg', 0.815308995, 0.237766941],
['B', 'Z', 'All', 0.42220178799999997, 0.455104209],
['B', 'Z', 'holoo', 0.459850205, 0.43298050200000004],
['B', 'Z', 'lnihf', 0.667876511, 0.317931565],
['B', 'Z', 'wirhq', 0.13887864800000002, 0.61440056]], dtype=object)
result = pd.DataFrame(result, columns=['section', 'subsection', 'name', 'a', 'b'])

最佳答案

你需要:

s = df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')

s1 = (s.mean(level=0)
.assign(subsection = 'All', name='All')
.set_index(['subsection','name'], append=True))
s2 = (s.mean(level=[0, 1])
.assign(name='All')
.set_index(['name'], append=True))

s = pd.concat([s, s1, s2]).sort_index()

但是如果需要 submeans 不确定上面的解决方案是否正确(平均均值),更好的是:

s1 = df.groupby(['section','topic'])['score'].mean().unstack('topic').assign(subsection = 'All', name='All').set_index(['subsection','name'], append=True)
s2 = df.groupby(['section','subsection','topic'])['score'].mean().unstack('topic').assign(name='All').set_index(['name'], append=True)

s = pd.concat([s, s1, s2]).sort_index()
print (s)
topic a b
section subsection name
A All All 0.490341 0.573542
W All 0.524608 0.575909
jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
zwphf 0.802427 0.673711
X All 0.456074 0.571174
cvksn 0.548099 0.791990
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
B All All 0.522005 0.390516
Y All 0.621808 0.325927
cqokn 0.577504 0.252534
spghx 0.472612 0.487480
wjsxg 0.815309 0.237767
Z All 0.422202 0.455104
holoo 0.459850 0.432980
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401

编辑:

如果需要排序 - 这里 tot instaed All 可以使用 ordered categoricals:

cat1 = ['tot'] + df['subsection'].unique().tolist()
cat2 = ['tot'] + df['name'].unique().tolist()

df['subsection'] = pd.Categorical(df['subsection'], categories=cat1, ordered=True)
df['name'] = pd.Categorical(df['name'], categories=cat2, ordered=True)

s = df.groupby(['section', 'subsection', 'name', 'topic'])['score'].mean().unstack('topic')
s1 = (df.groupby(['section','topic'])['score'].mean()
.unstack('topic').assign(subsection = 'tot', name='tot')
.set_index(['subsection','name'], append=True))

s2 = (df.groupby(['section','subsection','topic'])['score'].mean()
.unstack('topic')
.assign(name='tot')
.set_index(['name'], append=True))

s = pd.concat([s, s1, s2]).sort_index()

print (s)
topic a b
section subsection name
A tot tot 0.490341 0.573542
W tot 0.524608 0.575909
zwphf 0.802427 0.673711
jcyyc 0.404077 0.507962
kucem 0.367319 0.546055
X tot 0.456074 0.571174
ldbxz 0.554260 0.148214
vkcqh 0.265864 0.773320
cvksn 0.548099 0.791990
B tot tot 0.522005 0.390516
Y tot 0.621808 0.325927
spghx 0.472612 0.487480
cqokn 0.577504 0.252534
wjsxg 0.815309 0.237767
Z tot 0.422202 0.455104
holoo 0.459850 0.432980
lnihf 0.667877 0.317932
wirhq 0.138879 0.614401

关于python - 带有 Pandas pivot_table 的嵌套小计 'All' 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55440716/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com