gpt4 book ai didi

python - 删除除列表中的所有单词

转载 作者:太空宇宙 更新时间:2023-11-04 09:32:32 25 4
gpt4 key购买 nike

我有一个像下面这样的 pandas 数据框,它包含单词的句子,我还有一个名为 vocab 的列表,我想从句子中删除除了单词列表中的单词之外的所有单词。

例子 df:

                                 sentence
0 packag come differ what about tomorrow
1 Hello dear truth is hard to tell

示例词汇:

['packag', 'differ', 'tomorrow', 'dear', 'truth', 'hard', 'tell']

预期的 O/P:

                                   sentence                  res
0 packag come differ what about tomorrow packag differ tomorrow
1 Hello dear truth is hard to tell dear truth hard tell

我首先尝试使用 .str.replace 并从句子中删除所有重要数据,然后将其存储到 t1 中。再次对 t1 和 sentence 做同样的事情,这样我就会得到我预期的输出。但它没有按预期工作。

我的尝试:

vocab_lis=['packag', 'differ', 'tomorrow', 'dear', 'truth', 'hard', 'tell']
vocab_regex = ' '+' | '.join(vocab_lis)+' '
df=pd.DataFrame()
s = pd.Series(["packag come differ what about tomorrow", "Hello dear truth is hard to tell"])
df['sentence']=s
df['sentence']= ' '+df['sentence']+' '

df['t1'] = df['sentence'].str.replace(vocab_regex, ' ')
df['t2'] = df.apply(lambda x: pd.Series(x['sentence']).str.replace(' | '.join(x['t1'].split()), ' '), axis=1)

有什么简单的方法可以完成我的上述任务吗?我知道我的代码因为空格而无法正常工作。如何解决?

最佳答案

使用嵌套列表推导并按空格分隔:

df['res'] = [' '.join(y for y in x.split() if y in vocab_lis) for x in df['sentence']]
print (df)
sentence res
0 packag come differ what about tomorrow packag differ tomorrow
1 Hello dear truth is hard to tell dear truth hard tell

vocab_regex = '|'.join(r"\b{}\b".format(x) for x in vocab_lis)
df['t1'] = df['sentence'].str.replace(vocab_regex, '')
print (df)
sentence t1
0 packag come differ what about tomorrow come what about
1 Hello dear truth is hard to tell Hello is to

关于python - 删除除列表中的所有单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55138991/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com