gpt4 book ai didi

python - 更新 scikit 多项式分类器

转载 作者:行者123 更新时间:2023-11-30 09:13:01 25 4
gpt4 key购买 nike

我正在尝试使用新的训练数据更新 scikit 多项式分类器。这是我尝试过的

from sklearn.feature_extraction.text import HashingVectorizer
import numpy as np
from sklearn.naive_bayes import MultinomialNB

# Training with first training set
targets = ['education','film','sports','laptops','phones']
x = ["football is the sport","gravity is the movie", "education is imporatant","lenovo is a laptop","android phones"]
y = np.array([2,1,0,3,4])
clf = MultinomialNB()
vectorizer = HashingVectorizer(stop_words='english', non_negative=True,
n_features=32*2)
X_train = vectorizer.transform(x)
clf.partial_fit(X_train, y, classes=[0,1,2,3,4])

#Testing with First training set
test_data = ["android","lenovo","Transformers"]
X_test = vectorizer.transform(test_data)
print "Using Initial classifier"
pred = clf.predict(X_test)
for doc, category in zip(test_data, pred):
print('%r => %s' % (doc, targets[category]))

# Training with updated training set
x = ["cricket", "Transformers is a film","which college"]
y = np.array([2,1,0])
X_train = vectorizer.transform(x)
clf.partial_fit(X_train, y)

# Testing with the updated trainign set
test_data = ["android","lenovo","Transformers"]
X_test = vectorizer.transform(test_data)
print "\nUsing Updatable classifiers"
pred = clf.predict(X_test)
for doc, category in zip(test_data, pred):
print('%r => %s' % (doc, targets[category]))

输出是

Using Initial classifier
'android' => phones
'lenovo' => laptops
'Transformers' => education

Using Updatable classifiers
'android' => sports
'lenovo' => education
'Transformers' => film

我有两个问题 ->

1) “lenovo”的类别出现错误,因为更新分类器时未包含该类别的训练数据。有没有什么解决办法可以避免这种情况。因为我不想每次更新分类器时都提供每个类别的训练数据。因此,即使我在更新时提供单个类别的数据,它也应该有效。

2) 如何向现有分类器添加新类别。就像我想要为现有分类器添加一个新类别(例如“健康”)一样。那有什么办法可以做到这一点吗?

感谢帮助。谢谢

最佳答案

不要为第一批调用 fit,而是调用 partial_fit 并将问题中所有类的列表作为 classes 参数提供给它:

clf.partial_fit(X, y, classes=targets)

(假设y实际上包含类标签而不是它们的索引。)

首次调用 partial_fit(或 fit)后,您无法更改类的数量。您只需预先知道类的数量或重新训练整个模型即可。

关于python - 更新 scikit 多项式分类器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25179800/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com