gpt4 book ai didi

machine-learning - CountVectorizer MultinomialNB ValueError : dimension mismatch

转载 作者:行者123 更新时间:2023-11-30 09:08:48 24 4
gpt4 key购买 nike

我正在尝试让我的 MultinomialNB 工作。我在训练和测试集上使用 CountVectorizer,当然两个集中都有不同的单词。所以我明白了,为什么会出现错误

ValueError: dimension mismatch

发生了,但我不知道如何解决它。我尝试了 CountVectorizer().transform 而不是 CountVectorizer().fit_transform 正如另一篇文章( SciPy and scikit-learn - ValueError: Dimension mismatch )中所建议的那样,但这只是给了我

NotFittedError: CountVectorizer - Vocabulary wasn't fitted.

如何正确使用 CountVectorizer?

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cross_validation import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
import sklearn.feature_extraction

df = data
y = df["meal_parent_category"]
X = df['name_cleaned']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)
X_train = CountVectorizer().fit_transform(X_train)
X_test = CountVectorizer().fit_transform(X_test)
algo = MultinomialNB()
algo.fit(X_train,y_train)
y = algo.predict(X_test)
print(classification_report(y_test,y_pred))

最佳答案

好吧,在问这个问题之后我就明白了:)这是词汇等的解决方案:

df = train
y = df["meal_parent_category_cleaned"]
X = df['name_cleaned']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)
vectorizer_train = CountVectorizer()
X_train = vectorizer_train.fit_transform(X_train)
vectorizer_test = CountVectorizer(vocabulary=vectorizer_train.vocabulary_)
X_test = vectorizer_test.transform(X_test)
algo = MultinomialNB()
algo.fit(X_train,y_train)
y_pred = algo.predict(X_test)
print(classification_report(y_test,y_pred))

关于machine-learning - CountVectorizer MultinomialNB ValueError : dimension mismatch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45543303/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com