gpt4 book ai didi

python - scikit-learn 中 SVC 分类器的预测错误?

转载 作者:行者123 更新时间:2023-11-30 09:56:17 25 4
gpt4 key购买 nike

我生成了自己的语料库,因此我分成了一个训练文本文件,如下所示:

POS|This film was awesome, highly recommended
NEG|I did not like this film
NEU|I went to the movies
POS|this film is very interesting, i liked a lot
NEG|the film was very boring i did not like it
NEU|the cinema is big
NEU|the cinema was dark

为了测试,我还有另一个未标记的文本评论:

I did not like this film

然后我执行以下操作:

import pandas as pd
from sklearn.feature_extraction.text import HashingVectorizer

trainingdata = pd.read_csv('/Users/user/Desktop/training.txt',
header=None, sep='|', names=['labels', 'movies_reviews'])


vect = HashingVectorizer(analyzer='word', ngram_range=(2,2), lowercase=True, n_features=7)
X = vect.fit_transform(trainingdata['movies_reviews'])
y = trainingdata['labels']
TestText= pd.read_csv('/Users/user/Desktop/testing.txt',
header=None, names=['test_opinions'])
test = vect.transform(TestText['test_opinions'])
from sklearn.svm import SVC
svm = SVC()
svm.fit(X, y)

prediction = svm.predict(test)
print prediction

预测是:

['NEU']

然后我想到的是为什么这个预测是错误的?这是代码问题、功能还是分类算法问题?我尝试解决这个问题,当我从训练文本文件中删除最后一条评论时,我意识到总是在预测该文件的最后一个元素。知道如何解决这个问题吗?

最佳答案

SVM 对参数设置非常敏感。您需要进行网格搜索才能找到正确的值。我尝试在您的数据集上训练两种朴素贝叶斯,并且在训练集上获得了完美的准确性:

from sklearn.naive_bayes import *
from sklearn.feature_extraction.text import *

# first option- Gaussian NB
vect = HashingVectorizer(analyzer='word', ngram_range=(2,2), lowercase=True)
X = vect.fit_transform(trainingdata['movies_reviews'])
y = trainingdata['labels']
nb = GaussianNB().fit(X.A,y) # input needs to be dense
nb.predict(X.A) == y

# second option- MultinomialNB (input needs to be positive, use CountingVect instead)
vect = CountVectorizer(analyzer='word', ngram_range=(2,2), lowercase=True)
X = vect.fit_transform(trainingdata['movies_reviews'])
y = trainingdata['labels']
nb = MultinomialNB().fit(X,y)
nb.predict(X.A) == y

这两种情况的输出都是

Out[33]: 
0 True
1 True
2 True
3 True
4 True
5 True
6 True
Name: labels, dtype: bool

关于python - scikit-learn 中 SVC 分类器的预测错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27753168/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com