gpt4 book ai didi

python - 从文本列表中删除单词

转载 作者:太空宇宙 更新时间:2023-11-03 15:05:29 25 4
gpt4 key购买 nike

我试图从文本字符串列表中删除某些单词(除了使用停用词之外),但由于某种原因它不起作用

documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]

exclude = ['am', 'there','here', 'for', 'of', 'user']

new_doc = [word for word in documents if word not in exclude]

print new_doc

输出

['Human machine interface for lab abc computer applications', 'A survey of user opinion of computer system response time', 'The EPS user interface management system', 'System and human system engineering testing of EPS', 'Relation of user perceived response time to error measurement', 'The generation of random binary unordered trees', 'The intersection graph of paths in trees', 'Graph minors IV Widths of trees and well quasi ordering', 'Graph minors A survey']

如您所见,没有从 DOCUMENTS 中删除 EXCLUDE 中的任何单词(例如“for”就是一个典型示例)

它适用于这个运算符:

new_doc = [word for word in str(documents).split() if word not in exclude]

但是我该如何取回 DOCUMENTS 中的初始元素(尽管是“已清理的元素”)?

非常感谢您的帮助!

最佳答案

您应该在过滤之前将行拆分为单词:

new_doc = [' '.join([word for word in line.split() if word not in exclude]) for line in documents]

关于python - 从文本列表中删除单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33240606/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com