gpt4 book ai didi

python - 使用 groupby 拆分数据框并将子集合并到列中

转载 作者:太空宇宙 更新时间:2023-11-04 01:15:55 25 4
gpt4 key购买 nike

我有一个很大的 pandas.DataFrame,看起来像这样:

test = pandas.DataFrame({"score": numpy.random.randn(10)})
test["name"] = ["A"] * 3 + ["B"] * 3 + ["C"] * 4
test.index = range(3) + range(3) + range(4)
id  score       name0   -0.652909   A1   0.100885    A2   0.410907    A0   0.304012    B1   -0.198157   B2   -0.054764   B0   0.358484    C1   0.616415    C2   0.389018    C3   1.164172    C

So the index is non-unique but is unique if I group by the column name. I would like to split the data frame into subsections by name and then assemble (by means of an outer join) the score columns into one big new data frame and change the column names of the scores to the respective group key. What I have at the moment is:

df = pandas.DataFrame()
for (key, sub) in test.groupby("name"):
df = df.join(sub["score"], how="outer")
df.columns.values[-1] = key

这会产生预期的结果:

id  A           B           C0   -0.652909   0.304012    0.3584841   0.100885    -0.198157   0.6164152   0.410907    -0.054764   0.3890183   NaN         NaN         1.164172

but seems not very pandas-ic. Is there a better way?

Edit: Based on the answers I ran some simple timings.

%%timeit
df = pandas.DataFrame()
for (key, sub) in test.groupby("name"):
df = df.join(sub["score"], how="outer")
df.columns.values[-1] = key
100 loops, best of 3: 2.46 ms per loop
%%timeit
test.set_index([test.index, "name"]).unstack()
1000 loops, best of 3: 1.04 ms per loop
%%timeit
test.pivot_table("score", test.index, "name")
100 loops, best of 3: 2.54 ms per loop

所以 unstack 似乎是首选方法。

最佳答案

您要查找的函数是unstack .为了让 pandas 知道要拆栈的目的,我们将首先创建一个 MultiIndex,我们将列添加为 last 索引。 unstack() 然后将基于最后一个索引层取消堆叠(默认情况下),因此我们得到您想要的:

In[152]: test = pandas.DataFrame({"score": numpy.random.randn(10)})
test["name"] = ["A"] * 3 + ["B"] * 3 + ["C"] * 4
test.index = range(3) + range(3) + range(4)
In[153]: test
Out[153]:
score name
0 -0.208392 A
1 -0.103659 A
2 1.645287 A
0 0.119709 B
1 -0.047639 B
2 -0.479155 B
0 -0.415372 C
1 -1.390416 C
2 -0.384158 C
3 -1.328278 C
In[154]: test.set_index([index, 'name'], inplace=True)
test.unstack()
Out[154]:
score
name A B C
0 -0.208392 0.119709 -0.415372
1 -0.103659 -0.047639 -1.390416
2 1.645287 -0.479155 -0.384158
3 NaN NaN -1.328278

关于python - 使用 groupby 拆分数据框并将子集合并到列中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24759397/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com