gpt4 book ai didi

python - 使用 MultiIndex 在 pandas DataFrame 中进行整洁的行间计算

转载 作者:太空宇宙 更新时间:2023-11-04 01:20:30 27 4
gpt4 key购买 nike

我将经典的 ucb 录取数据集作为具有 multiIndex 的 pandas DataFrame:

                            value
Dept Gender Admit
A Male Admitted 512
Rejected 313
Female Admitted 89
Rejected 19

等对于其他部门(“A”到“F”)

我想创建一个表格,显示被录取学生与被拒绝学生的比例,按部门和性别分组

我目前的做法是

ucbA.groupby(level=['Dept', 'Gender']).apply(lambda x: x.xs('Admitted', level=2).iloc[0] / x.xs('Rejected', level=2).iloc[0]).unstack().value

太可怕了

admitted = ucbA.unstack('Admit')
DataFrame({'Proportion Accepted': admitted.value.Admitted / admitted.value.Rejected}).unstack(1)

我想这没问题,但我觉得它应该可以作为单线而不需要拆叠。

有没有一种非常巧妙的方法来做这样的事情?我在想象一个单线停留在多索引的上下文中。

编辑:全帧:

DataFrame({'Admit': {0: 'Admitted',  1: 'Rejected',  2: 'Admitted',  3: 'Rejected',  4: 'Admitted',  5: 'Rejected',  6: 'Admitted',  7: 'Rejected',  8: 'Admitted',  9: 'Rejected',  10: 'Admitted',  11: 'Rejected',  12: 'Admitted',  13: 'Rejected',  14: 'Admitted',  15: 'Rejected',  16: 'Admitted',  17: 'Rejected',  18: 'Admitted',  19: 'Rejected',  20: 'Admitted',  21: 'Rejected',  22: 'Admitted',  23: 'Rejected'}, 'Dept': {0: 'A',  1: 'A',  2: 'A',  3: 'A',  4: 'B',  5: 'B',  6: 'B',  7: 'B',  8: 'C',  9: 'C',  10: 'C',  11: 'C',  12: 'D',  13: 'D',  14: 'D',  15: 'D',  16: 'E',  17: 'E',  18: 'E',  19: 'E',  20: 'F',  21: 'F',  22: 'F',  23: 'F'}, 'Gender': {0: 'Male',  1: 'Male',  2: 'Female',  3: 'Female',  4: 'Male',  5: 'Male',  6: 'Female',  7: 'Female',  8: 'Male',  9: 'Male',  10: 'Female',  11: 'Female',  12: 'Male',  13: 'Male',  14: 'Female',  15: 'Female',  16: 'Male',  17: 'Male',  18: 'Female',  19: 'Female',  20: 'Male',  21: 'Male',  22: 'Female',  23: 'Female'}, 'value': {0: 512,  1: 313,  2: 89,  3: 19,  4: 353,  5: 207,  6: 17,  7: 8,  8: 120,  9: 205,  10: 202,  11: 391,  12: 138,  13: 279,  14: 131,  15: 244,  16: 53,  17: 138,  18: 94,  19: 299,  20: 22,  21: 351,  22: 24,  23: 317}}).set_index(['Dept', 'Gender', 'Admit']).astype(float).astype(int)

或者,如果你有 rpy:

import pandas.rpy.common as com
ucbA = com.load_data('UCBAdmissions').set_index(['Dept', 'Gender', 'Admit']).astype(float).astype(int)

最佳答案

给你:

df = pd.DataFrame({'Dept':['A','A','A','A'],
'Gender':['Male', 'Male', 'Female', 'Female'],
'Admit':['Admitted', 'Rejected', 'Admitted', 'Rejected'],
'value':[512,313,89,19]})
df = df.set_index(['Dept', 'Gender', 'Admit'])


# Proportions accepted and rejected:
df / df.groupby(level=['Dept','Gender']).transform(sum)
# value
#Dept Gender Admit
#A Female Admitted 0.824074
# Rejected 0.175926
# Male Admitted 0.620606
# Rejected 0.379394

# If you really want admitted as fraction of rejected:
df2 = df.swaplevel(1,2).swaplevel(0,1)
df2.ix['Admitted'] / df2.ix['Rejected']
# value
#Dept Gender
#A Male 1.635783
# Female 4.684211

关于python - 使用 MultiIndex 在 pandas DataFrame 中进行整洁的行间计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21580877/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com