gpt4 book ai didi

python - 如何在 Python 中一个一个地从文件中读取标记?

转载 作者:太空宇宙 更新时间:2023-11-04 02:44:15 25 4
gpt4 key购买 nike

我遇到的问题是,在我的代码中,我无法获得与要从原始文本中删除的停用词相匹配的单个词/标记。相反,我得到的是一个完整的句子,因此无法将其与停用词相匹配。请告诉我一种方法,通过它我可以获得单个标记,然后将它们与停用词匹配并删除它们。请帮助我。

from nltk.corpus import stopwords
import string, os
def remove_stopwords(ifile):
processed_word_list = []
stopword = stopwords.words("urdu")
text = open(ifile, 'r').readlines()
for word in text:
print(word)
if word not in stopword:
processed_word_list.append('*')
print(processed_word_list)
return processed_word_list

if __name__ == "__main__":
print ("Input file path: ")
ifile = input()
remove_stopwords(ifile)

最佳答案

试试这个:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string, os, ast
def remove_stopwords(ifile):
processed_word_list = []
stopword = stopwords.words("urdu")
words = ast.literal_eval(open(ifile, 'r').read())
for word in words:
print(word)
if word not in stopword:
processed_word_list.append('*')
else:
processed_word_list.append(word)
print(processed_word_list)
return processed_word_list

if __name__ == "__main__":
print ("Input file path: ")
ifile = input()
remove_stopwords(ifile)

关于python - 如何在 Python 中一个一个地从文件中读取标记?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45617523/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com