gpt4 book ai didi

python - NLTK精度: "ValueError: too many values to unpack"

转载 作者:行者123 更新时间:2023-12-01 04:34:44 25 4
gpt4 key购买 nike

我正在尝试使用 NLTK 工具包对 Twitter 上的一部新电影进行一些情感分析。我遵循 NLTK 'movie_reviews' 示例,并构建了自己的 CategorizedPlaintextCorpusReader 对象。当我调用 nltk.classify.util.accuracy(classifier, testfeats) 时,问题就出现了。这是代码:

import os
import glob
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

trainfeats = negfeats + posfeats

# Building a custom Corpus Reader
tweets = nltk.corpus.reader.CategorizedPlaintextCorpusReader('./tweets', r'.*\.txt', cat_pattern=r'(.*)\.txt')
tweetsids = tweets.fileids()
testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]

print 'Training the classifier'
classifier = NaiveBayesClassifier.train(trainfeats)

for tweet in tweetsids:
print tweet + ' : ' + classifier.classify(word_feats(tweets.words(tweetsids)))

classifier.show_most_informative_features()

print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)

一切似乎都工作正常,直到到达最后一行。这就是我收到错误的时候:

>>> nltk.classify.util.accuracy(classifier, testfeats)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack

有人发现代码中有什么问题吗?

谢谢。

最佳答案

错误信息

File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack

出现是因为 gold 中的项目无法解压缩为 2 元组,(fs,l) :

[fs for (fs,l) in gold]  # <-- The ValueError is raised here

如果gold,您会得到同样的错误等于 [(1,2,3)] ,自 3 元组 (1,2,3)无法解压缩为 2 元组 (fs,l) :

In [74]: [fs for (fs,l) in [(1,2)]]
Out[74]: [1]
In [73]: [fs for (fs,l) in [(1,2,3)]]
ValueError: too many values to unpack

gold可能被埋在 nltk.classify.util.accuracy 的实现中,但这暗示您的输入 classifiertestfeats “形状”错误。

分类器没有问题,因为调用 accuracy(classifier, trainfeats)作品:

In [61]: print 'accuracy:', nltk.classify.util.accuracy(classifier, trainfeats)
accuracy: 0.9675

问题一定出在 testfeats .

<小时/>

比较 trainfeatstestfeatstrainfeats[0]是一个包含字典和分类的二元组:

In [63]: trainfeats[0]
Out[63]:
({u'!': True,
u'"': True,
u'&': True,
...
u'years': True,
u'you': True,
u'your': True},
'neg') # <--- Notice the classification, 'neg'

但是testfeats[0]只是一个字典,word_feats(tweets.words(fileids=[f])) :

testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]

因此,要解决此问题,您需要定义 testfeats看起来更像trainfeats -- word_feats 返回的每个字典必须与分类配对。

关于python - NLTK精度: "ValueError: too many values to unpack",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31920199/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com