gpt4 book ai didi

python - Pandas 将多列堆叠成一列

转载 作者:行者123 更新时间:2023-12-04 07:20:34 25 4
gpt4 key购买 nike

我有以下数据帧:

                    ETHNIC                       RACE        AGE       TRT01A
0 NOT HISPANIC OR LATINO WHITE 31.824778 Treatment B
1 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
2 HISPANIC OR LATINO WHITE 45.522245 Treatment A
3 HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 42.910335 Treatment B
4 NOT HISPANIC OR LATINO WHITE 31.381246 Placebo
5 NOT HISPANIC OR LATINO WHITE 38.045175 Treatment B
6 HISPANIC OR LATINO WHITE 39.337440 Placebo
7 NOT HISPANIC OR LATINO WHITE 47.121150 Placebo
8 NOT HISPANIC OR LATINO WHITE 38.203970 Treatment A
9 NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN 22.926762 Placebo
10 HISPANIC OR LATINO WHITE 45.226557 Treatment B
11 HISPANIC OR LATINO WHITE 32.112252 Placebo
只需将上面的数据框复制到剪贴板并运行 df=pd.read_clipboard('\s\s+')将数据框放入变量中。
out = (df.groupby(['TRT01A','ETHNIC', 'RACE'])['AGE']
.agg(mean=np.mean,
n='count',
deviation=np.std,
Q1=lambda x: np.percentile(x, 0.25)
)
.T.unstack().unstack(0)
)
我在上面的数据帧中执行了一些聚合,并转置,并连续拆开它们以获得以下结果:
TRT01A                                                        Placebo  Treatment A  Treatment B
ETHNIC RACE
HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean NaN NaN 42.910335
n NaN NaN 1.000000
deviation NaN NaN NaN
Q1 NaN NaN 42.910335
WHITE mean 35.724846 45.522245 45.226557
n 2.000000 1.000000 1.000000
deviation 5.108979 NaN NaN
Q1 32.130315 45.522245 45.226557
NOT HISPANIC OR LATINO BLACK OR AFRICAN AMERICAN mean 22.926762 NaN NaN
n 1.000000 NaN NaN
deviation NaN NaN NaN
Q1 22.926762 NaN NaN
WHITE mean 36.627881 38.203970 34.934976
n 3.000000 1.000000 2.000000
deviation 9.087438 NaN 4.398485
Q1 31.381246 38.203970 31.840329
现在,我想解开所有索引以获得以下结构(即插入 NaN 所有索引列的行,从第一到第二,以及表示索引级别的 Level 列):
                             Placebo  Treatment A  Treatment B  Level
HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1 <---
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0 <---
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1 <---
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1 <---
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2
此问题与 previous question that I asked 相同,但问题是,聚合后可能有 1 到 4 个索引列(即聚合可能应用于 1 到 5 列),并且在这种情况下很难使用相同的先前解决方案。

最佳答案

使用自定义函数 DataFrame.append 先用定制 DataFrame默认填充 NaN值(value)观:

def f(x):
names = pd.DataFrame(index=x.name, columns=x.columns).assign(Level=[0,1])
#print (names)
return names.append(x.reset_index(level=[0,1], drop=True).assign(Level=2))

out = out.groupby(level=[0,1], group_keys=False).apply(f)
然后删除重复的 0级别:
out = out[~out.index.duplicated() | out['Level'].isin([1,2])]
print (out)
TRT01A Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2

关于python - Pandas 将多列堆叠成一列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68528535/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com