gpt4 book ai didi

python - scikit.learn cross_val_score 出错

转载 作者:行者123 更新时间:2023-11-28 16:35:44 25 4
gpt4 key购买 nike

请引用以下地址的notebook

LogisticRegression

这部分代码,

scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
print scores
print scores.mean()

在 window 7 64 位机器上产生以下错误

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-37-4a10affe67c7> in <module>()
1 # evaluate the model using 10-fold cross-validation
----> 2 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
3 print scores
4 print scores.mean()

C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
1140 allow_nans=True, allow_nd=True)
1141
-> 1142 cv = _check_cv(cv, X, y, classifier=is_classifier(estimator))
1143 scorer = check_scoring(estimator, score_func=score_func, scoring=scoring)
1144 # We clone the estimator to make sure that all the folds are

C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask)
1366 if classifier:
1367 if type_of_target(y) in ['binary', 'multiclass']:
-> 1368 cv = StratifiedKFold(y, cv, indices=needs_indices)
1369 else:
1370 cv = KFold(_num_samples(y), cv, indices=needs_indices)

C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state)
428 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
429 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 430 label_test_folds = test_folds[y == label]
431 # the test split can be too big because we used
432 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array

我用的是scikit.learn 0.15.2,建议here这可能是 Windows 7、64 位机器的特定问题。

==============更新==============

我发现下面的代码确实有效

 from sklearn.cross_validation import KFold
cv = KFold(X.shape[0], 10, shuffle=True, random_state=33)
scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=cv)
print scores

==============更新2=============

似乎由于某些软件包更新,我无法再在我的机器上重现此类错误。如果您在 Windows 7 64 位机器上遇到同样的问题,请告诉我。

最佳答案

当我发现这个问题时,我遇到了与您相同的错误并且正在寻找答案。

我使用相同的 sklearn.cross_validation.cross_val_score(除了不同的算法)和相同的机器 windows 7,64 位。

我从上面尝试了您的解决方案并且它“有效”,但它给了我以下警告:

C:\Users\E245713\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\cross_validation.py:1531: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). estimator.fit(X_train, y_train, **fit_params)

阅读警告后,我认为问题与“y”(我的标签列)的形状有关。从警告中尝试的关键字是“ravel()”。所以,我尝试了以下方法:

y_arr = pd.DataFrame.as_matrix(label)
print(y_arr)
print(y_arr.shape())

给了我

  [[1]
[0]
[1]
..,
[0]
[0]
[1]]

(87939, 1)

当我添加 'ravel()' 时:

y_arr = pd.DataFrame.as_matrix(label).ravel()
print(y_arr)
print(y_arr.shape())

它给了我:

[1 0 1 ..., 0 0 1]

(87939,)

“y_arr”的维度必须采用 (87939,) 而不是 (87939,1) 的形式。之后,我原来的 cross_val_score 在没有添加 Kfold 代码的情况下工作。

希望这对您有所帮助。

关于python - scikit.learn cross_val_score 出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26504053/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com