gpt4 book ai didi

python - 如何在 pandas 数据框中执行多个条件的 drop_duplicates

转载 作者:太空宇宙 更新时间:2023-11-03 14:13:35 25 4
gpt4 key购买 nike

我有一个df,

    Sr.No   Name    Class   Data
0 1 Sri 1 sri is a good player
1 '' Sri 2 sri is good in cricket
2 '' Sri 3 sri went out
3 2 Ram 1 Ram is a good player
4 '' Ram 2 sri is good in cricket
5 '' Ram 3 Ram went out
6 3 Sri 1 sri is a good player
7 '' Sri 2 sri is good in cricket
8 '' Sri 3 sri went out
9 4 Sri 1 sri is a good player
10 '' Sri 2 sri is good in cricket
11 '' Sri 3 sri went out
12 '' Sri 4 sri came back

我正在尝试根据 ["Name","Class","Data"] 删除重复项。目标是根据 Sr No 的所有句子删除重复项。

我的预期输出是,

out_df


Sr.No Name Class Data
0 1 Sri 1 sri is a good player
1 Sri 2 sri is good in cricket
2 Sri 3 sri went out
3 2 Ram 1 Ram is a good player
4 Ram 2 sri is good in cricket
5 Ram 3 Ram went out
9 4 Sri 1 sri is a good player
10 Sri 2 sri is good in cricket
11 Sri 3 sri went out
12 Sri 4 sri came back

最佳答案

使用 groupby + transform 操作创建虚拟列。

v = df.groupby(df['Class'].diff().le(0).cumsum())['Data'].transform(' '.join)

或者,

v = df['Data'].groupby(df['Class'].diff().le(0).cumsum()).transform(' '.join) 

此虚拟列成为决定要删除哪些行的一个因素。

m = df.assign(Foo=v).duplicated(["Name", "Class", "Data", "Foo"])    
df[~m]

Class Data Name Sr.No
0 1 sri is a good player Sri 1
1 2 sri is good in cricket Sri
2 3 sri went out Sri
3 1 Ram is a good player Ram 2
4 2 sri is good in cricket Ram
5 3 Ram went out Ram
9 1 sri is a good player Sri 4
10 2 sri is good in cricket Sri
11 3 sri went out Sri
12 4 sri came back Sri
<小时/>

详细信息

从单调递增的 Class 值形成组 -

i = df['Class'].diff().le(0).cumsum()
i

0 0
1 0
2 0
3 1
4 1
5 1
6 2
7 2
8 2
9 3
10 3
11 3
12 3
Name: Class, dtype: int64

使用它来分组,并通过 str.join 操作转换数据 -

v = df.groupby(i)['Data'].transform(' '.join)

这只是一列连接的字符串。最后,分配虚拟列并调用 duplicated -

m = df.assign(Foo=v).duplicated(["Name", "Class", "Data", "Foo"]) 
m

0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
8 True
9 False
10 False
11 False
12 False
dtype: bool

关于python - 如何在 pandas 数据框中执行多个条件的 drop_duplicates,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48335265/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com