gpt4 book ai didi

python - Pandas - 合并两个具有部分匹配的 DataFrame

转载 作者:行者123 更新时间:2023-12-03 23:01:58 24 4
gpt4 key购买 nike

有了下图所示的数据框,我想在 ['A','B','C'] 上合并和 ['X','Y','Z']首先然后逐渐寻找少一列的匹配项,即 ['A','B']['X','Y']然后 ['A']['X']不复制结果的行,在下面的示例中 a,y,y,v3a,d,d 起被排除在外已经匹配。
enter image description here
到目前为止,我的代码在所有 3 列上都匹配:

df1 = pd.DataFrame({"A":['a','b','c'],"B":['d','e','f'],"C":['d','e','f']})
df2 = pd.DataFrame({"X":['a','b','a','c'],"Y":['d','e','y','z'],"Z":['d','x','y','z'],"V":['v1','v2','v3','v4']})

merged = pd.merge(df1,df2,left_on=['A','B','C'],right_on=['X','Y','Z'], how='left')
merged = merged.drop_duplicates(['A','B','C'])
merged.head()

enter image description here
我怎样才能实现我的目标?
更新:预期输出
enter image description here

最佳答案

一个想法有多个 merge DataFrame.drop_duplicates 一起循环第二个 DataFrame什么应该避免最终 DataFrame 中的重复行:

from functools import reduce

dfs = []
L = [['A', 'B', 'C'], ['X', 'Y', 'Z']]

for i in range(len(L[0]), 0, -1):
df22 = df2.drop_duplicates(L[1][:i])
df = pd.merge(df1,df22,left_on=L[0][:i],right_on=L[1][:i], how='left')
dfs.append(df)

df = reduce(lambda l,r: pd.DataFrame.fillna(l,r), dfs)
print (df)
A B C X Y Z V
0 a d d a d d v1
1 b e e b e x v2
2 c f f c z z v4
像这样工作:
merged1 = pd.merge(df1,df2.drop_duplicates(['X','Y','Z']),left_on=['A','B','C'],right_on=['X','Y','Z'], how='left')
merged2 = pd.merge(df1,df2.drop_duplicates(['X','Y']),left_on=['A','B'],right_on=['X','Y'], how='left')
merged3 = pd.merge(df1,df2.drop_duplicates('X'),left_on=['A'],right_on=['X'], how='left')

df = merged1.fillna(merged2).fillna(merged3)
print (df)
A B C X Y Z V
0 a d d a d d v1
1 b e e b e x v2
2 c f f c z z v4

关于python - Pandas - 合并两个具有部分匹配的 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65090530/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com