python - WordListCorpusReader 不可迭代-6ren

python - WordListCorpusReader 不可迭代

转载作者：太空宇宙更新时间：2023-11-04 06:51:51

24

4

所以，我是 Python 和 NLTK 的新手。我有一个名为 reviews.csv 的文件，其中包含从亚马逊提取的评论。我已将此 csv 文件的内容标记化并将其写入名为 csvfile.csv 的文件中。这是代码:

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.stem import PorterStemmer
import csv #CommaSpaceVariable
from nltk.corpus import stopwords
ps = PorterStemmer()
stop_words = set(stopwords.words("english"))
with open ('reviews.csv') as csvfile:
    readCSV = csv.reader(csvfile,delimiter='.')    
    for lines in readCSV:
        word1 = word_tokenize(str(lines))
        print(word1)
    with open('csvfile.csv','a') as file:
        for word in word1:
            file.write(word)
            file.write('\n')
    with open ('csvfile.csv') as csvfile:
        readCSV1 = csv.reader(csvfile)
    for w in readCSV1:
        if w not in stopwords:
            print(w)

我正在尝试对 csvfile.csv 执行词干提取。但是我得到这个错误:

  Traceback (most recent call last):<br>
  File "/home/aarushi/test.py", line 25, in <module> <br>
   if w not in stopwords: <br>
  TypeError: argument of type 'WordListCorpusReader' is not iterable

最佳答案

当你做的时候

from nltk.corpus import stopwords

stopwords 是指向 nltk 中的 CorpusReader 对象的变量。

您要查找的实际停用词(即停用词列表)会在您执行以下操作时实例化:

stop_words = set(stopwords.words("english"))

因此，当检查标记列表中的单词是否为停用词时，您应该这样做:

from nltk.corpus import stopwords
stop_words = set(stopwords.words("english"))
for w in tokenized_sent:
    if w not in stop_words:
        pass # Do something.

为了避免混淆，我通常将实际的停用词列表命名为stoplist:

from nltk.corpus import stopwords
stoplist = set(stopwords.words("english"))
for w in tokenized_sent:
    if w not in stoplist:
        pass # Do something.

关于python - WordListCorpusReader 不可迭代，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46986560/

24

4

0

文章推荐： c - MISRA 20.2 违规误解

文章推荐：使用指向指针的指针时无法理解我的错误 (C)

文章推荐： Python对列表列表的第一个元素求和

python - WordListCorpusReader 不可迭代
所以，我是 Python 和 NLTK 的新手。我有一个名为 reviews.csv 的文件，其中包含从亚马逊提取的评论。我已将此 csv 文件的内容标记化并将其写入名为 csvfile.csv 的文
python - 类型错误: 'WordListCorpusReader' object has no attribute '__getitem__' while using nltk.分类.apply_features
我正在按照本教程学习 NaiveBayes this site 。我的代码是: from nltk.corpus import names from nltk.classify import appl

首页

博学

6Ren·AI

商城

python - WordListCorpusReader 不可迭代