gpt4 book ai didi

python - 带管道的岭回归网格搜索

转载 作者:行者123 更新时间:2023-11-30 08:31:34 28 4
gpt4 key购买 nike

我正在尝试优化岭回归的超参数。而且还要添加多项式特征。因此,管道看起来不错,但在尝试 gridsearchcv 时出现错误。这里:

# Importing the Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error
from collections import Counter
from IPython.core.display import display, HTML
sns.set_style('darkgrid')

# Data Preprocessing
from sklearn.datasets import load_boston
boston_dataset = load_boston()
dataset = pd.DataFrame(boston_dataset.data, columns = boston_dataset.feature_names)
dataset['MEDV'] = boston_dataset.target

# X and y Variables
X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values.reshape(-1,1)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 25)

# Building the Model ------------------------------------------------------------------------

# Fitting regressior to the Training set
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

steps = [
('scalar', StandardScaler()),
('poly', PolynomialFeatures(degree=2)),
('model', Ridge())
]

ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train, y_train)
# Predicting the Test set results
y_pred = ridge_pipe.predict(X_test)

# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = ridge_pipe, X = X_train, y = y_train, cv = 10)
accuracies.mean()
#accuracies.std()

# Applying Grid Search to find the best model and the best parameters
from sklearn.model_selection import GridSearchCV

parameters = [ {'alpha': np.arange(0, 0.2, 0.01) } ]

grid_search = GridSearchCV(estimator = ridge_pipe,
param_grid = parameters,
scoring = 'accuracy',
cv = 10,
n_jobs = -1)
grid_search = grid_search.fit(X_train, y_train) # <-- GETTING ERROR IN HERE

错误:

ValueError: Invalid parameter ridge for estimator

该怎么办,或者是否有更好的方法将岭回归与管道结合使用?如果能提供一些有关网格搜索的资料,我会很高兴,因为我是这方面的新手。错误:

最佳答案

您的代码中有两个问题。首先,由于您使用的是管道,因此需要在参数列表中指定参数属于管道的哪一部分。请参阅the official documentation欲了解更多信息:

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below

在这种情况下,由于 alpha 将与 ridge-regression 一起使用,并且您已在 Pipeline 定义中使用了字符串 model ,您需要将键 alpha 重命名为 model_alpha:

steps = [
('scalar', StandardScaler()),
('poly', PolynomialFeatures(degree=2)),
('model', Ridge()) # <------ Whatever string you assign here will be used later
]

# Since you have named it as 'model', you need change it to 'model_alpha'
parameters = [ {'model__alpha': np.arange(0, 0.2, 0.01) } ]

接下来,您需要了解此数据集用于回归。您不应在此处使用accuracy,而应使用基于回归的评分函数,例如mean_squared_error。这里有一些other metrics for regression您可以使用。像这样的事情

from sklearn.metrics import mean_squared_error, make_scorer
scoring_func = make_scorer(mean_squared_error)

grid_search = GridSearchCV(estimator = ridge_pipe,
param_grid = parameters,
scoring = scoring_func, #<--- Use the scoring func defined above
cv = 10,
n_jobs = -1)

这里是 Google colab notebook 的链接与工作代码。

关于python - 带管道的岭回归网格搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57377309/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com