gpt4 book ai didi

python - 连接两个数据帧并根据列值删除重复行

转载 作者:行者123 更新时间:2023-12-01 06:37:34 25 4
gpt4 key购买 nike

我有两个数据框。

df1:

    Name Symbol         ID
0 Jay N/A 372Y105
1 Ray N/A 4446100
2 Faye N/A 484MAA4
3 Maye N/A 504W308
4 Kay N/A 782L107
5 Trey FFF 782L111

df2:

    Name Symbol         ID
0 Jay AAA 372Y105
1 Faye CCC 484MAA4
2 Kay EEE 782L107

如果IDdf1df2之间匹配,我想替换df1中的symbol 与来自 df2symbol,因此输出如下所示:

    Name Symbol         ID
0 Jay AAA 372Y105
1 Ray N/A 4446100
2 Faye CCC 484MAA4
3 Maye N/A 504W308
4 Kay EEE 782L107
5 Trey FFF 782L111

听起来我应该首先连接两个数据帧,然后以某种方式删除重复项,例如

df3 = pd.concat([df1, df2])
df3 = df3.drop_duplicates(subset='ID', keep='last')

但我不想只保留第一个或最后一个重复项,而是只想删除 symbol = N/A 的内容。

最佳答案

使用merge首先使用左连接,然后用 Symbol_ 列替换 Symbol 列中的缺失值:

print (df1.merge(df2, on=['Name','ID'], how='left', suffixes=('', '_')))
Name Symbol ID Symbol_
0 Jay NaN 372Y105 AAA
1 Ray NaN 4446100 NaN
2 Faye NaN 484MAA4 CCC
3 Maye NaN 504W308 NaN
4 Kay NaN 782L107 EEE
5 Trey FFF 782L111 NaN

df = (df1.merge(df2, on=['Name','ID'], how='left', suffixes=('', '_'))
.assign(Symbol = lambda x: x['Symbol'].fillna(x.pop('Symbol_'))))
print (df)
Name Symbol ID
0 Jay AAA 372Y105
1 Ray NaN 4446100
2 Faye CCC 484MAA4
3 Maye NaN 504W308
4 Kay EEE 782L107
5 Trey FFF 782L111

另一个解决方案 DataFrame.update :

df1 = df1.set_index(['Name','ID'])
df2 = df2.set_index(['Name','ID'])
df1.update(df2)
df1 = df1.reset_index()
print (df1)
Name ID Symbol
0 Jay 372Y105 AAA
1 Ray 4446100 NaN
2 Faye 484MAA4 CCC
3 Maye 504W308 NaN
4 Kay 782L107 EEE
5 Trey 782L111 FFF

关于python - 连接两个数据帧并根据列值删除重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59601541/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com