我有一个很大的 pandas.DataFrame
,看起来像这样:
test = pandas.DataFrame({"score": numpy.random.randn(10)})
test["name"] = ["A"] * 3 + ["B"] * 3 + ["C"] * 4
test.index = range(3) + range(3) + range(4)
id score name0 -0.652909 A1 0.100885 A2 0.410907 A0 0.304012 B1 -0.198157 B2 -0.054764 B0 0.358484 C1 0.616415 C2 0.389018 C3 1.164172 C
So the index is non-unique but is unique if I group by the column name
. I would like to split the data frame into subsections by name and then assemble (by means of an outer join) the score columns into one big new data frame and change the column names of the scores to the respective group key. What I have at the moment is:
df = pandas.DataFrame()
for (key, sub) in test.groupby("name"):
df = df.join(sub["score"], how="outer")
df.columns.values[-1] = key
这会产生预期的结果:
id A B C0 -0.652909 0.304012 0.3584841 0.100885 -0.198157 0.6164152 0.410907 -0.054764 0.3890183 NaN NaN 1.164172
but seems not very pandas
-ic. Is there a better way?
Edit: Based on the answers I ran some simple timings.
%%timeit
df = pandas.DataFrame()
for (key, sub) in test.groupby("name"):
df = df.join(sub["score"], how="outer")
df.columns.values[-1] = key
100 loops, best of 3: 2.46 ms per loop
%%timeit
test.set_index([test.index, "name"]).unstack()
1000 loops, best of 3: 1.04 ms per loop
%%timeit
test.pivot_table("score", test.index, "name")
100 loops, best of 3: 2.54 ms per loop
所以 unstack
似乎是首选方法。
您要查找的函数是unstack .为了让 pandas
知道要拆栈的目的,我们将首先创建一个 MultiIndex
,我们将列添加为 last 索引。 unstack()
然后将基于最后一个索引层取消堆叠(默认情况下),因此我们得到您想要的:
In[152]: test = pandas.DataFrame({"score": numpy.random.randn(10)})
test["name"] = ["A"] * 3 + ["B"] * 3 + ["C"] * 4
test.index = range(3) + range(3) + range(4)
In[153]: test
Out[153]:
score name
0 -0.208392 A
1 -0.103659 A
2 1.645287 A
0 0.119709 B
1 -0.047639 B
2 -0.479155 B
0 -0.415372 C
1 -1.390416 C
2 -0.384158 C
3 -1.328278 C
In[154]: test.set_index([index, 'name'], inplace=True)
test.unstack()
Out[154]:
score
name A B C
0 -0.208392 0.119709 -0.415372
1 -0.103659 -0.047639 -1.390416
2 1.645287 -0.479155 -0.384158
3 NaN NaN -1.328278
我是一名优秀的程序员,十分优秀!