gpt4 book ai didi

python - Scikit-learn 中用于多标签分类的 GridSearch

转载 作者:太空宇宙 更新时间:2023-11-03 13:43:12 26 4
gpt4 key购买 nike

我正在尝试在十重交叉验证中的每个人中进行 GridSearch 以获得最佳超参数,它在我之前的多类分类工作中运行良好,但这次在多标签工作中情况并非如此。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
clf = OneVsRestClassifier(LinearSVC())

C_range = 10.0 ** np.arange(-2, 9)
param_grid = dict(estimator__clf__C = C_range)

clf = GridSearchCV(clf, param_grid)
clf.fit(X_train, y_train)

我收到错误:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-65-dcf9c1d2e19d> in <module>()
6
7 clf = GridSearchCV(clf, param_grid)
----> 8 clf.fit(X_train, y_train)

/usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit(self, X, y)
595
596 """
--> 597 return self._fit(X, y, ParameterGrid(self.param_grid))
598
599

/usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in _fit(self, X, y,
parameter_iterable)
357 % (len(y), n_samples))
358 y = np.asarray(y)
--> 359 cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
360
361 if self.verbose > 0:

/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _check_cv(cv, X,
y, classifier, warn_mask)
1365 needs_indices = None
1366 if classifier:
-> 1367 cv = StratifiedKFold(y, cv, indices=needs_indices)
1368 else:
1369 if not is_sparse:

/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self,
y, n_folds, indices, shuffle, random_state)
427 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
428 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429 label_test_folds = test_folds[y == label]
430 # the test split can be too big because we used
431 # KFold(max(c, self.n_folds), self.n_folds) instead of

ValueError: boolean index array should have 1 dimension

可能指的是标签指示符的维度或格式。

print X_train.shape, y_train.shape

得到:

(147, 1024) (147, 6)

似乎 GridSearch 本身就实现了 StratifiedKFold。问题出现在多标签问题的分层K-fold策略中。

StratifiedKFold(y_train, 10)

给予

ValueError                                Traceback (most recent call last)
<ipython-input-87-884ffeeef781> in <module>()
----> 1 StratifiedKFold(y_train, 10)

/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self,
y, n_folds, indices, shuffle, random_state)
427 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
428 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429 label_test_folds = test_folds[y == label]
430 # the test split can be too big because we used
431 # KFold(max(c, self.n_folds), self.n_folds) instead of

ValueError: boolean index array should have 1 dimension

目前使用传统的 K 折策略效果很好。有什么方法可以实现分层K-fold到多标签分类?

最佳答案

网格搜索执行stratified cross-validation对于分类问题,但对于多标签任务,这没有实现;事实上,多标签分层是机器学习中尚未解决的问题。我最近遇到了同样的问题,我能找到的所有文献都是 this article 中提出的方法。 (其作者表示他们也找不到任何其他解决此问题的尝试)。

关于python - Scikit-learn 中用于多标签分类的 GridSearch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26018543/

26 4 0