gpt4 book ai didi

python - 从 pandas 列中删除重复的单词

转载 作者:太空宇宙 更新时间:2023-11-04 07:49:39 25 4
gpt4 key购买 nike

我有一个数据框,其中的信息存储在一列中

>>> Results.Category[:5]
0 issue delivery wrong master account
1 data wrong master account batch
2 order delivery wrong data account
3 issue delivery wrong master account
4 delivery wrong master account batch
Name: Category, dtype: object

现在我想在类别列中保留唯一的词例如 :在第一行中出现“错误”一词我想将其从所有其余行中删除并仅在第一行中保留“错误”一词在第二行中,单词“data”可用,然后我想将其从所有其余行中删除,并仅在第二行中保留单词“data”

我发现如果行中有重复项,我们可以使用 below 删除,但我需要从列中删除重复的单词,谁能帮帮我。

AFResults['FinalCategoryN'] = AFResults['FinalCategory'].apply(lambda x: remove_dup(x))

最佳答案

看来你想要这样的东西,

out = []
seen = set()
for c in df['Category']:
words = c.split()
out.append(' '.join([w for w in words if w not in seen]))
seen.update(words)

df['FinalCategoryN'] = out
df

Category FinalCategoryN
0 issue delivery wrong master account issue delivery wrong master account
1 data wrong master account batch data batch
2 order delivery wrong data account order
3 issue delivery wrong master account
4 delivery wrong master account batch

如果你不关心顺序,你可以使用集合逻辑:

u = df['Category'].apply(str.split)
v = split.shift().map(lambda x: [] if x != x else x).cumsum().map(set)
(u.map(set) - v).str.join(' ')

0 account delivery issue master wrong
1 batch data
2 order
3
4
Name: Category, dtype: object

关于python - 从 pandas 列中删除重复的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56853948/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com