gpt4 book ai didi

database - 如何删除只出现一次的单词

转载 作者:搜寻专家 更新时间:2023-10-30 21:55:55 26 4
gpt4 key购买 nike

这是我的数据集

Id Text
1. Dear Mr. John, your bag order is delivered
2. Dear Mr. Brick, your ball order is delivered
3. Dear Mrs. Blue, your ball purchase is delivered

我需要的是

Id  Text
1. Dear Mr. your order is delivered
2. Dear Mr. your ball order is delivered
3. Dear your ball is delivered

所以只出现一次的词被删除

最佳答案

使用:

#split to words and create Series
all_val = df['Text'].str.split(expand=True).stack()
#remove duplicates and join together per first level of MultiIndex
df['Text'] = all_val[all_val.duplicated(keep=False)].groupby(level=0).apply(' '.join)
print (df)
Id Text
0 1.0 Dear Mr. your order is delivered
1 2.0 Dear Mr. your ball order is delivered
2 3.0 Dear your ball is delivered

或者:

#join all text together and split by whitespaces
all_val = ' '.join(df['Text']).split()
#get unique values
once = [x for x in all_val if all_val.count(x) == 1]

#remove from text by nested list comprehension
df['Text'] = [' '.join([y for y in x.split() if y not in once]) for x in df['Text']]
#apply alternative
#df['Text'] = df['Text'].apply(lambda x: ' '.join([y for y in x.split() if y not in once]))
print (df)
Id Text
0 1.0 Dear Mr. your order is delivered
1 2.0 Dear Mr. your ball order is delivered
2 3.0 Dear your ball is delivered

关于database - 如何删除只出现一次的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51075221/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com