gpt4 book ai didi

python - 组合/合并具有重复名称的两个数据集

转载 作者:行者123 更新时间:2023-12-01 07:05:58 27 4
gpt4 key购买 nike

我尝试合并两个数据集(DataFrame),如下所示:

D1 = pd.DataFrame({'Village':['Ampil','Ampil','Ampil','Bachey','Bachey','Center','Center','Center','Center'], 'Code':[123,324,190,453,321,786,456,234,987]})

D2 = pd.DataFrame({'Village':['Ampil','Ampil','Bachey','Bachey','Center','Center'],'Lat':[11.563,13.278,12.637,11.356,12.736,13.456], 'Long':[102.234,103.432,105.673,103.539,103.873,102.983]})

我想根据 Village 列合并两者。我希望输出如下所示:

D3 = pd.DataFrame({'Village': ['Ampil','Ampil','Bachey','Bachey','Center','Center'],'Code':[123,324,453,321,786,456],'Lat':[11.563,13.278,12.637,11.356,12.736,13.456], 'Long':[102.234,103.432,105.673,103.539,103.873,102.983]})

我尝试过连接、合并和连接,但都不符合目的。我需要一个适用于更大数据的代码。如果有人能提供帮助,我真的很感激。

最佳答案

一种方法是首先按 Village 为初始 dfs 创建一个正在运行的 cumcount,然后按 Villagecount 合并:

df1['count'] = df1.groupby('Village').cumcount()
df2["count"] = df2.groupby('Village').cumcount()

print (df2.merge(df1,on=["Village","count"],how="left").drop("count",axis=1))

#
Village Lat Long Code
0 Ampil 11.563 102.234 123
1 Ampil 13.278 103.432 324
2 Bachey 12.637 105.673 453
3 Bachey 11.356 103.539 321
4 Center 12.736 103.873 786
5 Center 13.456 102.983 456

关于python - 组合/合并具有重复名称的两个数据集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58443205/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com