gpt4 book ai didi

python - 用于删除跨列具有相同内容的连续重复行的数据框

转载 作者:行者123 更新时间:2023-12-01 06:26:38 25 4
gpt4 key购买 nike

下面的数据框,当“人员”、“年份”和“项目”相同时,我想删除连续的重复行。

如果原始数据框如下所示,则连续具有相同“人物”、“年份”、“项目”的行将被删除。

data = {'People' : ["David","David","David","David","John","John","John"],
'Year': ["2016","2016","2017","2016","2016","2017","2017",],
'Project' : ["TN","TN","TN","TN","DJ","DM","DM"],
'Earning' : [878,682,767,620,964,610,772]}

我尝试了这个,但它不起作用:

df_1 = df.loc[(df['People', 'Year', 'Project'].shift() != df['People', 'Year', 'Project'])]

尝试 - 此行删除不连续的“David, 2016, TN, 620”

df_1 = df.drop_duplicates(subset=['People','Year','Project'])

enter image description here

更改为此后,它会保留所有行:

df_1 = df.drop_duplicates(subset=['People','Year','Project', 'Earning'])

正确的做法是什么?谢谢!

最佳答案

您可以比较DataFrame.shift ed 值不等于,然后通过 DataFrame.any 每行测试至少一个 Trueboolean indexing :

cols = ['People','Year','Project']
df_1 = df[df[cols].ne(df[cols].shift()).any(axis=1)]
print (df_1)
People Year Project Earning
0 David 2016 TN 878
2 David 2017 TN 767
3 David 2016 TN 620
4 John 2016 DJ 964
5 John 2017 DM 610

详细信息:

print (df[cols].ne(df[cols].shift()))
People Year Project
0 True True True
1 False False False
2 False True False
3 False True False
4 True False True
5 False True True
6 False False False

print (df[cols].ne(df[cols].shift()).any(axis=1))
0 True
1 False
2 True
3 True
4 True
5 True
6 False
dtype: bool

关于python - 用于删除跨列具有相同内容的连续重复行的数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60111199/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com