gpt4 book ai didi

scikit-learn - SKEARN//结合 GridsearchCV 与列变换和管道

转载 作者:行者123 更新时间:2023-12-04 04:06:50 26 4
gpt4 key购买 nike

我正在为一个机器学习项目而苦苦挣扎,我试图在其中结合:

  • 一个 sklearn 列变换,用于将不同的更改器(mutator)应用于我的数值和分类特征
  • 应用我的不同转换器和估计器的管道
  • GridSearchCV寻找最佳参数。

  • 只要我在我的pipeline中手动填写我的不同transformer的参数,代码就完美运行了。
    但是,一旦我尝试传递不同值的列表以在我的 gridsearch 参数中进行比较,我就会收到各种无效参数错误消息。

    这是我的代码:

    首先,我将我的特征分为数字特征和分类特征
    from sklearn.compose import make_column_selector
    from sklearn.pipeline import make_pipeline
    from sklearn.model_selection import GridSearchCV
    from sklearn.impute import KNNImputer
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import cross_val_score
    from sklearn.impute import SimpleImputer
    from sklearn.preprocessing import OneHotEncoder


    numerical_features=make_column_selector(dtype_include=np.number)
    cat_features=make_column_selector(dtype_exclude=np.number)

    然后我为数值和分类特征创建了 2 个不同的预处理管道:
    numerical_pipeline= make_pipeline(KNNImputer())
    cat_pipeline=make_pipeline(SimpleImputer(strategy='most_frequent'),OneHotEncoder(handle_unknown='ignore'))

    我将两者结合到另一个管道中,设置我的参数,并运行我的 GridSearchCV代码
    model=make_pipeline(preprocessor, LinearRegression() )

    params={
    'columntransformer__numerical_pipeline__knnimputer__n_neighbors':[1,2,3,4,5,6,7]
    }

    grid=GridSearchCV(model, param_grid=params,scoring = 'r2',cv=10)
    cv = KFold(n_splits=5)
    all_accuracies = cross_val_score(grid, X, y, cv=cv,scoring='r2')

    我尝试了不同的方法来声明参数,但从未找到合适的方法。我总是收到“无效参数”错误消息。

    你能帮我理解出了什么问题吗?

    真的非常感谢您的支持,请保重!

    最佳答案

    我假设您可能已经定义了 preprocessor如下,

    preprocessor = Pipeline([('numerical_pipeline',numerical_pipeline),
    ('cat_pipeline', cat_pipeline)])

    那么你需要改变你的参数名称如下:
    pipeline__numerical_pipeline__knnimputer__n_neighbors
    但是,代码还有其他几个问题:
  • 您不必调用 cross_val_score执行后GridSearchCV . GridSearchCV 本身的输出将具有每个超参数组合的交叉验证结果。
  • KNNImputer当您的数据具有字符串数据时将不起作用。您需要申请cat_pipeline之前 num_pipeline .

  • 完整示例:
    from sklearn.preprocessing import StandardScaler, OneHotEncoder
    from sklearn.compose import make_column_transformer
    from sklearn.compose import make_column_selector
    import pandas as pd # doctest: +SKIP
    X = pd.DataFrame({'city': ['London', 'London', 'Paris', np.nan],
    'rating': [5, 3, 4, 5]}) # doctest: +SKIP

    y = [1,0,1,1]

    from sklearn.compose import make_column_selector
    from sklearn.pipeline import make_pipeline, Pipeline
    from sklearn.model_selection import GridSearchCV
    from sklearn.impute import KNNImputer
    from sklearn.linear_model import LinearRegression
    from sklearn.model_selection import cross_val_score, KFold
    from sklearn.impute import SimpleImputer
    from sklearn.preprocessing import OneHotEncoder


    numerical_features=make_column_selector(dtype_include=np.number)
    cat_features=make_column_selector(dtype_exclude=np.number)

    numerical_pipeline= make_pipeline(KNNImputer())
    cat_pipeline=make_pipeline(SimpleImputer(strategy='most_frequent'),
    OneHotEncoder(handle_unknown='ignore', sparse=False))
    preprocessor = Pipeline([('cat_pipeline', cat_pipeline),
    ('numerical_pipeline',numerical_pipeline)])
    model=make_pipeline(preprocessor, LinearRegression() )

    params={
    'pipeline__numerical_pipeline__knnimputer__n_neighbors':[1,2]
    }


    grid=GridSearchCV(model, param_grid=params,scoring = 'r2',cv=2)

    grid.fit(X, y)

    关于scikit-learn - SKEARN//结合 GridsearchCV 与列变换和管道,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62331674/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com