gpt4 book ai didi

python - 根据所选列过滤重复的行并与 Pandas 中的另一个数据框进行比较

转载 作者:行者123 更新时间:2023-12-05 08:37:26 24 4
gpt4 key购买 nike

给定两个数据框如下:

import pandas as pd 

# Creating a DataFrame object
df1 = pd.DataFrame([('Stuti', 28, 'Varanasi'),
('Saumya', 32, 'Delhi'),
('Aaditya', 25, 'Mumbai'),
('Saumya', 32, 'Delhi')],
columns = ['Name', 'Score', 'City'])

df2 = pd.DataFrame([('Saumya', 32, 'Delhi'),
('Saumya', 32, 'Mumbai'),
('Aaditya', 40, 'Mumbai'),
('Seema', 32, 'Delhi')],
columns = ['Name', 'Score', 'City'])

我如何为 df2 创建掩码以根据 df1 和列 NameCity 过滤重复的行>,如果df1中存在相同的配对,则返回checkDuplicated,否则返回None

预期的结果会是这样的:

    Name  Score      City       Check
0 Saumya 32 Delhi Duplicated
1 Saumya 32 Mumbai None
2 Aaditya 40 Dehradun Duplicated
3 Seema 32 Delhi None

更新代码:

df = pd.concat([df1, df2])

df[df.duplicated(['Name', 'City'])]

输出:

      Name  Score    City
3 Saumya 32 Delhi
0 Saumya 32 Delhi
2 Aaditya 40 Mumbai

最佳答案

In [65]: df2.merge(df1[['Name', 'City']].drop_duplicates(), how='left', indicator='Check').assign(Check=lambda x: np.where(x['Check'] == 'both', 'Duplicated', None))
Out[65]:
Name Score City Check
0 Saumya 32 Delhi Duplicated
1 Saumya 32 Mumbai None
2 Aaditya 40 Mumbai Duplicated
3 Seema 32 Delhi None

关于python - 根据所选列过滤重复的行并与 Pandas 中的另一个数据框进行比较,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65559950/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com