gpt4 book ai didi

python - Scikit-learn SequentialFeatureSelector Input contains NaN, infinity or a value too large for dtype ('float64' ).即使有管道

转载 作者:行者123 更新时间:2023-12-02 02:23:14 27 4
gpt4 key购买 nike

我正在尝试使用 SequentialFeatureSelector 并为 estimator 参数传递它一个管道,其中包括一个输入缺失值的步骤:

model = Pipeline(steps=[('preprocessing',
ColumnTransformer(transformers=[('pipeline-1',
Pipeline(steps=[('imputing',
SimpleImputer(fill_value=-1,
strategy='constant')),
('preprocessing',
StandardScaler())]),
<sklearn.compose._column_transformer.make_column_selector object at 0x1300013d0>),
('pipeline-2',
Pipeline(steps=[('imputing',
SimpleImputer(fill_value='missing',
strategy='constant')),
('encoding',
OrdinalEncoder(handle_unknown='ignore'))]),
<sklearn.compose._column_transformer.make_column_selector object at 0x1300015b0>)])),
('model',
LGBMClassifier(class_weight='balanced', random_state=1,
reg_lambda=0.1))])

尽管如此,当将它传递给选择器时它显示错误,这没有任何意义,因为我已经拟合并评估了我的模型并且它运行正常

fselector = SequentialFeatureSelector(estimator = model, scoring= "roc_auc", cv = 3, n_jobs= -1, ).fit(X, target)




_assert_all_finite(X, allow_nan, msg_dtype)
101 not allow_nan and not np.isfinite(X).all()):
102 type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103 raise ValueError(
104 msg_err.format
105 (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

编辑:

可重现的例子:

from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

X, y = load_iris(return_X_y = True)
X[:10,0] = np.NaN

clf = Pipeline([("preprocessing", SimpleImputer(missing_values= np.NaN)),("model",LogisticRegression(random_state = 1))])

SequentialFeatureSelector(estimator = clf,
scoring= "accuracy",
cv = 3).fit(X, y)

它显示相同的错误,尽管 clf 可以毫无问题地适应

最佳答案

ScikitLearn 的文档没有说明 SequentialFeatureSelector 与管道对象一起使用。它只有states该类接受不合适的估计量。鉴于此,您可以从管道中删除分类器,对 X 进行预处理,然后将其与未拟合的分类器一起传递以进行特征选择,如下例所示。

import numpy as np
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MaxAbsScaler


X, y = load_iris(return_X_y = True)
X[:10,0] = np.NaN

pipe = Pipeline([("preprocessing", SimpleImputer(missing_values= np.NaN)),
('scaler', MaxAbsScaler())])


# Preprocess your data
X = pipe.fit_transform(X)

# Run the SequentialFeatureSelector
sfs = SequentialFeatureSelector(estimator = LogisticRegression(),
scoring= "accuracy",
cv = 3).fit(X, y)

# Check which features are important and transform X
sfs.get_support()
X = sfs.transform(X)

关于python - Scikit-learn SequentialFeatureSelector Input contains NaN, infinity or a value too large for dtype ('float64' ).即使有管道,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66106909/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com