gpt4 book ai didi

python - pandas.crosstab 切片加上总计

转载 作者:太空宇宙 更新时间:2023-11-04 02:15:48 24 4
gpt4 key购买 nike

我目前真的在为数据框而苦苦挣扎。通过运行代码:(res_sum = 数据框的名称)

summary_table = pd.crosstab(index=[res_sum["Type"],res_sum["Size"]],
columns=res_sum["Found"],margins=True)
summary_table = summary_table.div(summary_table["All"] / 100, axis=0)

结果:

Found                 Exact   Near     No    All
Type Size
X 10 0.0 0.0 100.0 100.0
100 0.0 100.0 0.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Y 10 0.0 100.0 0.0 100.0
100 0.0 0.0 100.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
....... (more)
All 5.0 65.0 30.0 100.0

我想要这样的东西:

Found                 Exact   Near     No    All
Type Size
X 10 0.0 0.0 100.0 100.0
100 0.0 100.0 0.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Total X 0.0 80.0 20.0
Y 10 0.0 100.0 0.0 100.0
100 0.0 0.0 100.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Total Y 0.0 80.0 20.0
.......(more)
All 5.0 65.0 30.0 100.0

这在 pd.crosstab 中似乎是不可能的,所以我尝试制作每种类型的子集,然后再次将数据帧粘贴在一起。它有点工作,但它删除了所有总数。例如代码:

x5 = summary_table.loc(axis=0)[['X'], slice(None)]
x6 = summary_table.loc(axis=0)[['Y'], slice(None)]

frames = [x5, x6]
result = pd.concat(frames)

结果完全忽略了 pd.crosstable 中的“margins=True”。将“margins=True”添加到 dataframe.loc 不起作用。

Found                 Exact   Near     No    All
Type Size
X 10 0.0 0.0 100.0 100.0
100 0.0 0.0 100.0 100.0
500 100.0 0.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Y 10 0.0 0.0 100.0 100.0
100 0.0 100.0 0.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0

我需要说明一下为什么我需要这个,我需要每个类型的摘要,并且将来每个尺寸都会有更多的值,在一个类型中(所以不再是 100.0%)。谁能帮我组织这些数据框? (此外,如果删除每行末尾的“全部”,我会非常高兴。似乎我只能添加两个“全部”列,即使我只需要列。)

按要求编辑:

我使用的数据片段(我在问题中将类型更改为 X、Y、Z,但这些也能正常工作:

Found   Size    Type
Exact 500 INV
Near 100 DEL
Near 500 DEL
Near 1000 DEL
Near 5000 DEL
Near 100 INS
Near 500 INS
Near 1000 INS
Near 1000 INV
Near 5000 INV
Near 10 DUP
Near 500 DUP
Near 1000 DUP
Near 5000 DUP
No 10 DEL
No 10 INS
No 5000 INS
No 10 INV
No 100 INV
No 100 DUP

最佳答案

您可以使用 subtotals = df.groupby(level=['Type']).mean()计算小计。然后

label_order = ['{}{}'.format(pre,label) for label in subtotals.index 
for pre in ['', 'Total_']] + ['All']

生成所需的标签顺序。最后,df = df.loc[label_order] 重新排序行:

import pandas as pd
import numpy as np
nan = np.nan
df = pd.DataFrame({'All': [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, nan], 'Exact': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 65.0], 'Near': [0.0, 100.0, 100.0, 100.0, 100.0, 100.0, 0.0, 100.0, 100.0, 100.0, 30.0], 'No': [100.0, 0.0, 0.0, 0.0, 0.0, 0.0, 100.0, 0.0, 0.0, 0.0, 100.0], 'Size': [10.0, 100.0, 500.0, 1000.0, 5000.0, 10.0, 100.0, 500.0, 1000.0, 5000.0, 5.0], 'Type': ['X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y', 'Y', 'All']})

df = df.set_index(['Type','Size'])
df.columns.name = 'Found'

subtotals = df.groupby(level=['Type']).mean()
subtotals = subtotals.loc[subtotals.index != 'All']
label_order = ['{}{}'.format(pre,label) for label in subtotals.index for pre in ['', 'Total_']] + ['All']
subtotals.index = ['Total_{}'.format(label) for label in subtotals.index]
subtotals['Size'] = ''

df = pd.concat([df.reset_index('Size'), subtotals], axis=0, sort=False)
df = df.loc[label_order]
df = df.set_index('Size', append=True)

产量

                  All  Exact   Near     No
Size
X 10.0 100.0 0.0 0.0 100.0
100.0 100.0 0.0 100.0 0.0
500.0 100.0 0.0 100.0 0.0
1000.0 100.0 0.0 100.0 0.0
5000.0 100.0 0.0 100.0 0.0
Total_X 100.0 0.0 80.0 20.0
Y 10.0 100.0 0.0 100.0 0.0
100.0 100.0 0.0 0.0 100.0
500.0 100.0 0.0 100.0 0.0
1000.0 100.0 0.0 100.0 0.0
5000.0 100.0 0.0 100.0 0.0
Total_Y 100.0 0.0 80.0 20.0
All 5.0 NaN 65.0 30.0 100.0

关于python - pandas.crosstab 切片加上总计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52704580/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com