gpt4 book ai didi

python - 进行多标签分类时具有相同的准确度和 F1 分数

转载 作者:太空宇宙 更新时间:2023-11-04 08:32:27 26 4
gpt4 key购买 nike

我已经基于这个写了一个代码 site并制作了不同的多标签分类器。

我想根据每类的准确性和每类的 F1 测量来评估我的模型。

问题是我在所有模型中获得的准确度和 f1 测量值都相同。

我怀疑我做错了什么。我想知道在什么情况下可能会发生这种情况。

代码与网站完全相同,我这样计算 f1 测量值:

print('Logistic Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
print 'Logistic f1 measurement is {} '.format(f1_score(test[category], prediction, average='micro'))

更新 1

这是完整的代码,

df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']

train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences

NB_pipeline = Pipeline([('tfidf', TfidfVectorizer(stop_words=stop_words)),
('clf',OneVsRestClassifier(MultinomialNB(fit_prior=True,class_prior=None))),])
for category in categories:
print 'processing {} '.format(category)
NB_pipeline.fit(X_train,train[category])
prediction = NB_pipeline.predict(X_test)
print 'NB test accuracy is {} '.format(accuracy_score(test[category],prediction))
print 'NB f1 measurement is {} '.format(f1_score(test[category],prediction,average='micro'))
print "\n"

这是输出:

processing ADR 
NB test accuracy is 0.821963394343
NB f1 measurement is 0.821963394343

这是我的数据的样子:

,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1,0,0,0,0,0,0
1,I am detoxing from Lexapro now.,0,0,0,0,0,0,1
2,I slowly cut my dosage over several months and took vitamin supplements to help.,0,0,0,0,0,0,1
3,I am now 10 days completely off and OMG is it rough.,0,0,0,0,0,0,1
4,"I have flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness.",0,1,0,0,0,0,0
5,I have no idea when this will end.,1,0,0,0,0,0,1

为什么我得到的是相同的号码?

谢谢。

最佳答案

这样做:

for category in categories:
...
...

您实质上是将问题从多标签转变为二元问题。如果您想继续此操作,则不需要 OneVsRestClassifier。您可以直接使用 MultinomialNB。或者您可以直接使用 OneVsRestClassifier 执行此操作:

# Send all labels at once.
NB_pipeline.fit(X_train,train[categories])
prediction = NB_pipeline.predict(X_test)
print 'NB test accuracy is {} '.format(accuracy_score(test[categories],prediction))
print 'NB f1 measurement is {} '.format(f1_score(test[categories],prediction, average='micro'))

它可能会针对所有训练数据中存在的某些标签发出一些警告,但那是因为您发布的样本数据太小了。

@user2906838,你对分数的看法是正确的。当 average='micro' 时,产生的结果将是相等的。这是 mentioned in documentation here :

Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F,

那里写的是关于多类的,但我怀疑它也适用于二进制。请参阅用户手动计算所有分数的类似问题:Multi-class Clasification (multiclassification): Micro-Average Accuracy, Precision, Recall and F Score All Equal

关于python - 进行多标签分类时具有相同的准确度和 F1 分数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51815299/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com