gpt4 book ai didi

python - 通过聚合合并数据帧

转载 作者:太空宇宙 更新时间:2023-11-03 14:15:15 24 4
gpt4 key购买 nike

我想聚合一个数据框 - 获取每个组的第一行,同时连接“upc”列中的值:

df = pd.DataFrame({
'id1': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6, 6, 7, 7],
'id2': [11, 22, 11, 11, 22, 33, 33, 33, 33, 44, 44, 55, 66, 66, 22, 77, 77],
'value1': ["1first", "1second", "1third",
"2first", "2second",
"3first", "3second", "3third", "3fourth",
"4first", "4second",
"5first",
"6first", "6second", "6third",
"7first", "7second"],
'upc': [str(x) for x in range(100, 117)]
})
firsts_df = df.groupby(['id1', 'id2']).first()
concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc))
firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])

这会导致以下错误:

ValueError: can not merge DataFrame with instance of type class 'pandas.core.series.Series'

如何将聚合结果与数据框合并?我可以通过成本更低的操作获得相同的结果吗?

最佳答案

我认为您需要as_index=False首先并将reset_index()添加到concat_upcs_df >数据帧:

firsts_df = df.groupby(['id1', 'id2'], as_index=False).first()
concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc)).reset_index(name='val')
firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])
print (df)
id1 id2 upc value1 val
0 1 11 100 1first 100|102
1 1 22 101 1second 101
2 2 11 103 2first 103
3 2 22 104 2second 104
4 3 33 105 3first 105|106|107|108
5 4 44 109 4first 109|110
6 5 55 111 5first 111
7 6 22 114 6third 114
8 6 66 112 6first 112|113
9 7 77 115 7first 115|116

您还可以使用drop_duplicates而是 firstapply 而不使用 lambda,同样 merge使用on,因为左右连接的列是相同的:

firsts_df = df.drop_duplicates(['id1', 'id2'])
concat_upcs_df = df.groupby(['id1', 'id2'])['upc'].apply('|'.join).reset_index(name='val')
df = firsts_df.merge(concat_upcs_df, on=['id1', 'id2'])

关于python - 通过聚合合并数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48249559/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com