gpt4 book ai didi

machine-learning - 为什么多标签分类(二元相关性)起作用?

转载 作者:行者123 更新时间:2023-11-30 08:51:27 25 4
gpt4 key购买 nike

我是使用二进制相关性进行多标签分类的新手,并且在解释结果时遇到一些问题:

结果是: [[ 0.0.] [2.2.]]

这是否意味着第一个案例属于[0,0],第二个案例属于[2,2]?这看起来一点都不好。或者我还缺少其他东西吗?

在绅士们回答后,我收到以下错误因为 y_train 标签 [2**,0,**3,4] 因为零

Traceback (most recent call last):
File "driver.py", line 22, in <module>
clf_dict[i] = clf.fit(x_train, y_tmp)
File "C:\Users\BaderEX\Anaconda22\lib\site-packages\sklearn\linear_model\logistic.py", line 1154, in fit
self.max_iter, self.tol, self.random_state)
File "C:\Users\BaderEX\Anaconda22\lib\site-packages\sklearn\svm\base.py", line 885, in _fit_liblinear
" class: %r" % classes_[0])
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1

更新的代码:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import *

numer_classes = 5

x_train = np.array([[1,2,3,4],[0,1,2,1],[1,2,0,3]])
y_train = [[0],[1,0,3],[2,0,3,4]]

x_test = np.array([[1,2,3,4],[0,1,2,1],[1,2,0,3]])
y_test = [[0],[1,0,3],[2,0,3,4]]

clf_dict = {}
for i in range(numer_classes):
y_tmp = []
for j in range(len(y_train)):
if i in y_train[j]:
y_tmp.append(1)
else:
y_tmp.append(0)
clf = LogisticRegression()
clf_dict[i] = clf.fit(x_train, y_tmp)

prediction_matrix = np.zeros((len(x_test),numer_classes))
for i in range(numer_classes):
prediction = clf_dict[i].predict(x_test)
prediction_matrix[:,i] = prediction

print('Predicted')
print(prediction_matrix)

谢谢

最佳答案

对于二进制相关性,您应该为每个标签创建指示符类别:0 或 1。 scikit-multilearn提供与 scikit 兼容的分类器实现。

设置:

def to_indicator_matrix(y_list):
y_train_matrix = np.zeros(shape=(len(y_list), max(map(len, y_list))+1), dtype='i8')
for i, y in enumerate(y_list):
y_train_matrix[i][y] = 1
return y_train_matrix

给定您的 y_train 和 y_test,运行:

y_train = to_indicator_matrix(y_train)
y_test = to_indicator_matrix(y_test)

您的 y_train 现在是:

array([[1, 1, 0],
[0, 1, 1],
[1, 0, 1]])

这应该可以解决您的问题。不过,使用 scikit-multilearn BinaryRelevance 比使用您自己的代码更舒服。尝试一下!

运行

pip install scikit-multilearn

然后尝试

from skmultilearn.problem_transform import BinaryRelevance
from sklearn.linear_model import LogisticRegression
import sklearn.metrics

# assume data is loaded using
# and is available in X_train/X_test, y_train/y_test

# initialize Binary Relevance multi-label classifier
# with gaussian naive bayes base classifier
classifier = BinaryRelevance(LogisticRegression(C=40,class_weight='balanced'), require_dense)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

# measure
print(sklearn.metrics.hamming_loss(y_test, predictions))

关于machine-learning - 为什么多标签分类(二元相关性)起作用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40259135/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com