gpt4 book ai didi

python - Pandas `drop_duplicates` 不保留第一行

转载 作者:太空宇宙 更新时间:2023-11-03 14:25:35 27 4
gpt4 key购买 nike

我创建了一个具有重复行的数据框,如下所示:

df = pd.DataFrame({"Order Date": ["January 1, 2017", "March 15, 2017", "April 20, 2017", "June 23, 2017", "December 12, 2017", None, "April 20, 2017", "April 20, 2017"], 
"Sales Person": ["John", "John", "Rick", "Mary", "Mary", "Rick", "Rick", "Rick"],
"Items Sold": [4, -999, 1, np.nan, 7, 3, 1, 1],
"Item Price": [4.99, 1.99, 9.99, 19.99, 0.99, 2.99, 9.99, 9.99]})

在 Jupyter 中看起来像这样: Dataframe

如果我得到重复项,它会正确显示重复的两行。

df[df.duplicated()]

Duplicates

然后,我调用 drop_duplicates 删除第二个重复项并保留第一个。

df.drop_duplicates()

Dropped

但是,看起来它删除了两行而不是保留第一行。我是否在 drop_duplicates 方法中遗漏了某些内容?文档字符串表明 keep 参数默认为 first,即使我明确输入该参数,这种情况仍然会发生。

最佳答案

您的示例中有三个重复的行,使用 keep= False 查看全部

df[df.duplicated(keep=False)]
Out[661]:
Item Price Items Sold Order Date Sales Person
2 9.99 1.0 April 20, 2017 Rick
6 9.99 1.0 April 20, 2017 Rick
7 9.99 1.0 April 20, 2017 Rick

然后,您drop_duplicates将只保留第3行索引=2处的第一个

df.drop_duplicates()
Out[659]:
Item Price Items Sold Order Date Sales Person
0 4.99 4.0 January 1, 2017 John
1 1.99 -999.0 March 15, 2017 John
2 9.99 1.0 April 20, 2017 Rick
3 19.99 NaN June 23, 2017 Mary
4 0.99 7.0 December 12, 2017 Mary
5 2.99 3.0 None Rick

关于python - Pandas `drop_duplicates` 不保留第一行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47659385/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com