python - 修改玩具 scikit-learn gridsearchCV 示例时收到警告 "UserWarning: One or more of the test scores are non-finite"-6ren

python - 修改玩具 scikit-learn gridsearchCV 示例时收到警告 "UserWarning: One or more of the test scores are non-finite"

转载作者：行者123 更新时间：2023-12-04 11:13:34

我有以下代码可以正常工作但得到了一个

UserWarning: One or more of the test scores are non-finite: [nan nan]
  category=UserWarning

当我将其修改为更简洁的版本时(显示在随后的代码片段中)。单热编码器的输出是问题的罪魁祸首吗？

import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import RidgeClassifier
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import GridSearchCV

train = pd.read_csv('/train.csv')
test = pd.read_csv('/test.csv')
sparse_features = [col for col in train.columns if col.startswith('cat')]
dense_features = [col for col in train.columns if col not in sparse_features+['target']]
X = train.drop(['target'], axis=1)
y = train['target'].values
skf = StratifiedKFold(n_splits=5)
clf = RidgeClassifier()

full_pipeline = ColumnTransformer(transformers=[
    ('num', StandardScaler(), dense_features),
    ('cat', OneHotEncoder(), sparse_features)
])
X_prepared = full_pipeline.fit_transform(X)
param_grid = {
    'alpha': [ 0.1],
    'fit_intercept': [False]
}
gs = GridSearchCV(
    estimator=clf,
    param_grid=param_grid,
    scoring='roc_auc',
    n_jobs=-1,
    cv=skf
)
gs.fit(X_prepared, y)

修订如下所示。

clf2 = RidgeClassifier()
preprocess_pipeline2 = ColumnTransformer([
    ('num', StandardScaler(), dense_features),
    ('cat', OneHotEncoder(), sparse_features)
])
from sklearn.pipeline import Pipeline
final_pipeline = Pipeline(steps=[
    ('p', preprocess_pipeline2),
    ('c', clf2)
])
param_grid2 = {
    'c__alpha': [0.4, 0.1],
    'c__fit_intercept': [False]
}
gs2 = GridSearchCV(
    estimator=final_pipeline,
    param_grid=param_grid2,
    scoring='roc_auc',
    n_jobs=-1,
    cv=skf
)
gs2.fit(X, y)

谁能指出哪个部分出了问题？
编辑:设置后 error_score至 raise ，我可以收到有关此问题的更多反馈。在我看来，我需要在结合了训练集和测试集的合并数据集上安装 one-hot 编码器。我对么？但如果是这样，为什么第一个版本没有提示同样的问题？顺便说一句，引入参数 handle_unknown='ignore'有意义吗？处理这个问题？

ValueError
---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
    r = call_item()
  File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 620, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, error_score)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 200, in __call__
    sample_weight=sample_weight)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 334, in _score
    y_pred = method_caller(clf, "decision_function", X)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 493, in decision_function
    Xt = transform.transform(Xt)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 565, in transform
    Xs = self._fit_transform(X, None, _transform_one, fitted=True)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py", line 444, in _fit_transform
    self._iter(fitted=fitted, replace_strings=True), 1))
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 1044, in __call__
    while self.dispatch_one_batch(iterator):
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py", line 733, in _transform_one
    res = transformer.transform(X)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 462, in transform
    force_all_finite='allow-nan')
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py", line 136, in _transform
    raise ValueError(msg)
ValueError: Found unknown categories ['MR', 'MW', 'DA'] in column 10 during transform
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-48-b81f3b7b0724> in <module>
     21     cv=skf
     22 )
---> 23 gs2.fit(X, y)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    839                 return results
    840 
--> 841             self._run_search(evaluate_candidates)
    842 
    843             # multimetric is determined here because in the case of a callable

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
   1286     def _run_search(self, evaluate_candidates):
   1287         """Search all candidates in param_grid"""
-> 1288         evaluate_candidates(ParameterGrid(self.param_grid))
   1289 
   1290 

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params, cv, more_results)
    807                                    (split_idx, (train, test)) in product(
    808                                    enumerate(candidate_params),
--> 809                                    enumerate(cv.split(X, y, groups))))
    810 
    811                 if len(out) < 1:

/opt/conda/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1052 
   1053             with self._backend.retrieval_context():
-> 1054                 self.retrieve()
   1055             # Make sure that we get a last message telling us we are done
   1056             elapsed_time = time.time() - self._start_time

/opt/conda/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    931             try:
    932                 if getattr(self._backend, 'supports_timeout', False):
--> 933                     self._output.extend(job.get(timeout=self.timeout))
    934                 else:
    935                     self._output.extend(job.get())

/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    540         AsyncResults.get from multiprocessing."""
    541         try:
--> 542             return future.result(timeout=timeout)
    543         except CfTimeoutError as e:
    544             raise TimeoutError from e

/opt/conda/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    433                 raise CancelledError()
    434             elif self._state == FINISHED:
--> 435                 return self.__get_result()
    436             else:
    437                 raise TimeoutError()

/opt/conda/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

ValueError: Found unknown categories ['MR', 'MW', 'DA'] in column 10 during transform

最佳答案

如果是多类，则删除 roc_auc。他们在一起玩得不好。使用默认评分或选择其他内容。

关于python - 修改玩具 scikit-learn gridsearchCV 示例时收到警告 "UserWarning: One or more of the test scores are non-finite"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66620269/

文章推荐： android - Ionic & Capacitor - Android 闪屏响应

mysql - 数据库: `One to One` 与 `One to Many`
注意:这不是库存控制系统。我只是想绘制给哪个患者服用哪种药物的 map 。我没有考虑多少药包等。只是一次用药事件我对数据库关系突然感到困惑，即使在与他们合作多年之后也是如此。以下是我的情况。我有一
PHP群发邮件: One for each or one for all?
当用 PHP 发送群发邮件时，是向每个订阅者发送一封电子邮件(对所有电子邮件地址运行一个 for 循环)更好，还是仅将密件抄送中的所有内容添加到逗号分隔的列表中，并且因此只发送一封电子邮件？谢谢。
database - 设计表 : One to many and one to one at same time?
我不确定我是否正确地为这种类型的关系建模，也许有人可以提供一些见解来判断这是否合理: 假设我们有一个典型的亲子类型关系，其中每个 parent 都可以有很多 child ，但我们需要跟踪 parent
ruby-on-rails - ruby rails : One-to-Many with One-to-One
我有模板和模板版本。一个模板可以有多个 template_version，但在任何给定时间只有一个事件的 template_version。我有以下两个模型: class Template 'Tem
php - 防止 "One Hours" "One MInutes" "One Days"
如果我的代码是这样的: if($seconds < 60) $interval = "$seconds seconds ago"; else if($seconds < 3600) $
php - 拉维尔 : one to one relationship becomes one to many relationship
当我创建一对一关系迁移时，laravel 创建一对多关系。 PHP 7.1 和 MySQL 5.7 模型是:角色和用户。角色: public function user() { return
java - new String ("ONE") 和 String one = "ONE"在java中有什么区别？
这个问题在这里已经有了答案: 关闭 11 年前。 Possible Duplicate: Java Strings: “String s = new String(”silly“);” 我正在浏览一
php - MySQL数据库: one media table or one table for images and one for videos?
我正在创建一个社交网络，用户可以在其中上传和发布他们的图像和视频。现在，我不知道是否最好在我的数据库中创建一个表，将其命名为 media，并有一个 media_type 列或创建单独的图像和视频表。这
java - 执行线程 "one by one"
有n个线程可以访问的单例类。每个线程加载此类的实例并在循环中调用此类的方法。我必须控制执行流程，这样每个线程都可以调用第一个方法并暂停，只有在所有线程调用该方法一次之后，才必须恢复它们的工作。线程
PHP循环替换字符串 "one-by-one"中的字符
存在参数数量未知(动态构建)的 MySQL 查询，其格式如下: SELECT * FROM actions WHERE user1 = ? AND user10 = ? AND user11 = ?
machine-learning - 机器学习中的多类分类中的 "One vs Rest"和 "One vs One"策略有什么区别？
我检查了维基百科页面，但找不到它们之间的区别，两者似乎都将多类转换为多个线性分类器。最佳答案这是关于分割训练数据的策略。假设您有 N 个包含 C 类的数据样本。一对一:在这里，您一次选择 2 个
php - 使用codeigniter在mysql中插入多行: it inserts only one row and in that one row stores only one first character
我尝试在 sql 中插入多行。但它仅插入最后一行，并且在该行中仅存储每列的第一个字符。我通过 echo 打印查询，它只显示最后一行，但给出了每列的所有字符。另一件事是我通过单击提交按钮在两个表中插入值
c# - EF 5 代码优先 : one-to-one and one-to-many associations between two entities
我有两个实体:个人和公司。一家公司有一个或多个联系人(人)。公司至少有一个主要联系人(人)。实现这一点的最佳方法是什么？实体如下: public class Person { public
iOS UITabBarController 最佳实践 : one tab one controller or multiple tabs one controller?
我是 iOS 开发的新手，已经开始使用 Swift。我目前正在使用包含 3 个选项卡/导航的选项卡栏导航。我应该将 UIViewController 子类化并将其用于所有 3 个场景，还是每个场景都应
javascript - Javascript : open one window print that one and than close it than open one new window
我的要求是，我需要打开两个窗口，但第二个窗口必须在第一个窗口打印并关闭后打开。可能吗？但第二个窗口与第一个窗口同时打开。 HTML/JSP 代码打印 Java脚本函数打印(id){
c# - Entity Framework 5.0b2 代码优先 : One-To-Many and One-To-One for the same table, 带级联删除
经过几个小时的反复试验，我找到了这个 thread其中解释了如何建立具有相同两种类型的一对多关系和一对一关系。但是，我无法让它与级联删除一起使用: Thrown: "Unable to determ
javascript - 验证表 : how to show error message all at one time for the blank input instead of one by one?
我想验证我的表单，如果任何输入字段为空，错误警告将显示在空白输入字段旁边。对于空白输入，错误信息必须一次全部输出，而不是一一显示。如何做到这一点？下面是我的javascript代码: fun
css - 将 Yeseva+One 转换为 --font-yeseva-one : "Yeseva One"; 的 SED 表达式
我有一系列这样的字体值(命令分隔一行): Yeseva+One, Yrsa, ... 我正在寻找一个 SED(或其他 bash 工具表达式)来将每个值转换为这样的行语句: --font-yeseva-
objective-c - 核心音频 : how can one packet = one byte when clearly one packet = 4 bytes
我正在研究中的核心音频转换服务 Learning Core Audio 我对他们 sample code 中的这个例子感到震惊: while(1) { // wrap the destina
sql - 什么更好 : to have many similar databases or one database with similar tables or one database with one table?
关闭。这个问题需要更多focused .它目前不接受答案。想改进这个问题吗？更新问题，使其只关注一个问题 editing this post . 关闭 9 年前。 Improve this qu

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 修改玩具 scikit-learn gridsearchCV 示例时收到警告 "UserWarning: One or more of the test scores are non-finite"