I have a big dataframe with columns including ID and multiple values and different rows can have same or different ID values. I would like to create a new dataframe so, that every row has only one ID and the specific column values are just appended next to the ID. The Dataframe also has other columns with additional values that are same for same ID rows that i would like to keep
我有一个包含ID和多个值的列的大型数据帧,不同的行可以具有相同或不同的ID值。我想创建一个新的数据帧,这样每一行只有一个ID,并且特定的列值只是追加到该ID的旁边。
ID |
type1 |
type2 |
value1 |
value2 |
value3 |
1 |
dog |
yellow |
1 |
2 |
3 |
1 |
dog |
yellow |
5 |
6 |
7 |
2 |
cat |
brown |
1 |
1 |
1 |
3 |
mouse |
blue |
1 |
1 |
1 |
1 |
dog |
yellow |
1 |
2 |
3 |
expected output:
预期产出:
ID |
type1 |
type2 |
value |
1 |
dog |
yellow |
1 2 3 5 6 7 1 2 3 |
2 |
cat |
brown |
1 1 1 |
3 |
mouse |
blue |
1 1 1 |
I have been exploring the groupby option, can't get it to have this kind of output
我一直在探索Groupby选项,无法让它具有这种输出
更多回答
优秀答案推荐
You can melt
and groupby.agg
:
您可以熔化和Groupby.agg:
group = ['ID', 'type1', 'type2']
out = df.melt(group).groupby(group, as_index=False)['value'].agg(list)
Output:
产出:
ID type1 type2 value
0 1 dog yellow [1, 5, 1, 2, 6, 2, 3, 7, 3]
1 2 cat brown [1, 1, 1]
2 3 mouse blue [1, 1, 1]
If order matters:
如果顺序很重要:
out = (df.set_index(group).stack().groupby(group).agg(list)
.reset_index(name='value')
)
Output:
产出:
ID type1 type2 value
0 1 dog yellow [1, 2, 3, 5, 6, 7, 1, 2, 3]
1 2 cat brown [1, 1, 1]
2 3 mouse blue [1, 1, 1]
更多回答
我是一名优秀的程序员,十分优秀!