python - Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道-6ren

python - Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道

转载作者：行者123 更新时间：2023-12-05 03:37:05

24

4

我有以下模型，它缩放数据，然后使用多项式特征，最后将数据输入具有正则化的回归模型，如下所示:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) 

scaler = StandardScaler()
scaler.fit(X_train)
    
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

polynomial = PolynomialFeatures(degree=3, include_bias=False)           
polynomial.fit(X_train_scaled)

X_train_model = polynomial.transform(X_train_scaled)
X_test_model = polynomial.transform(X_test_scaled)

reg_model = Ridge(alpha=alpha)
reg_model.fit(X_train_model, y_train)

y_pred_train_model = reg_model.predict(X_train_model)
r2_train = r2_score(y_train, y_pred_train_model)

y_pred_test_model = reg_model.predict(X_test_model)
r2_test = r2_score(y_test, y_pred_test_model)

它工作正常，但对于许多适合和转换来说似乎有点麻烦。我在 sklearn 中听说过这个 Pipeline() 方法。如何在上面使用它来简化流程？

最佳答案

您可以使用 Pipeline() 重写您的代码，如下所示:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline

# generate the data
X, y = make_regression(n_samples=1000, n_features=100, noise=10, bias=1, random_state=42)

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# define the pipeline
pipe = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('preprocessor', PolynomialFeatures(degree=3, include_bias=False)),
    ('estimator', Ridge(alpha=1))
])

# fit the pipeline
pipe.fit(X_train, y_train)

# generate the model predictions
y_pred_train_pipe = pipe.predict(X_train)
print(y_pred_train_pipe[:5])
# [11.37182811   89.22027129 -106.51012773   79.5912864  -241.0138516]

y_pred_test_pipe = pipe.predict(X_test)
print(y_pred_test_pipe[:5])
# [16.88238278  57.50116009  50.35705205 -20.92005052 -76.04156972]

# calculate the r-squared
print(pipe.score(X_train, y_train))
# 0.9999999999787197

print(pipe.score(X_test, y_test))
# 0.463044896596684

没有 Pipeline() 的等效代码:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score

# generate the data
X, y = make_regression(n_samples=1000, n_features=100, noise=10, bias=1, random_state=42)

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# scale the data
scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# extract the polynomial features
polynomial = PolynomialFeatures(degree=3, include_bias=False)
polynomial.fit(X_train_scaled)

X_train_model = polynomial.transform(X_train_scaled)
X_test_model = polynomial.transform(X_test_scaled)

# fit the model
reg_model = Ridge(alpha=1)
reg_model.fit(X_train_model, y_train)

# generate the model predictions
y_pred_train_model = reg_model.predict(X_train_model)
print(y_pred_train_model[:5])
# [11.37182811   89.22027129 -106.51012773   79.5912864  -241.0138516]

y_pred_test_model = reg_model.predict(X_test_model)
print(y_pred_test_model[:5])
# [16.88238278  57.50116009  50.35705205 -20.92005052 -76.04156972]

# calculate the r-squared
print(r2_score(y_train, y_pred_train_model))
# 0.9999999999787197

print(r2_score(y_test, y_pred_test_model))
# 0.463044896596684

关于python - Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69443936/

24

4

0

文章推荐： typescript - 创建嵌套映射类型

文章推荐： r - 在 R CMD 检查期间忽略依赖项

Python PolynomialFeatures 将数据转换为与原始数据不同的形状
我正在使用 sklearn 的 PolynomialFeatures 将数据预处理为各种程度的变换，以便比较它们的模型拟合度。下面是我的代码: from sklearn.linear_mode
python - PolynomialFeatures 和 LinearRegression 返回不需要的系数
import os import pandas as pd import matplotlib.pyplot as plt from sklearn.pipeline import Pipeline
python - 无法理解 sklearn 的 PolynomialFeatures
在 sklearn 的多项式特征方面需要帮助。它适用于一个功能，但每当我添加多个功能时，它还会在数组中输出一些值，除了提升到度数的值之外。例如:对于这个数组， X=np.array([[230.1,3
python - Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道
我有以下模型，它缩放数据，然后使用多项式特征，最后将数据输入具有正则化的回归模型，如下所示: X_train, X_test, y_train, y_test = train_test_split(X
python - 为什么 scikit PolynomialFeatures 的次数输出总是为 1？
我不明白为什么 scikit 的 PolynomialFeatuers 次数的输出总是为 1。以 Degree=2 和 [a, b] 为例，输出为 [1, a, b, a^2, b^2, ab] 我
machine-learning - Scikit_learn 的 PolynomialFeatures 与逻辑回归导致分数较低
我有一个数据集 X，其形状为 (1741, 61)。使用带有 cross_validation 的逻辑回归，每次分割得到的结果约为 62-65% (cv =5)。我认为如果我对数据进行二次方处理，准
python - 使用 PolynomialFeatures 和 LinearRegression 拟合更高阶函数
在一本书中，我找到了以下代码，它适合二次数据的线性回归: m = 100 X = 6 * np.random.rand(m, 1) - 3 y = 0.5 * X**2 + X + 2 + np.ra
python - 如何查看 sklearn.preprocessing.PolynomialFeatures 的效果？
如果我有中等数量的基本特征，并从中生成中等阶的多项式特征，那么要知道特征数组 preprocess_XX 的哪一列对应于哪个转换可能会有点困惑的基本特征。我曾经用旧版本的 sklearn(可能是 0
python - 如何按词典顺序组织 PolynomialFeatures 的系数，以便它们与多元多项式的 sympy 匹配？
我有一组参数，我手动(我希望它是手动的)使用 PolynomialFeatures 安装伪逆函数: poly_feat = PolynomialFeatures(degree=Degree_mdl)
python - Sklearn 预处理 - PolynomialFeatures - 如何保留输出数组/数据帧的列名/标题
TLDR:如何从 sklearn.preprocessing.PolynomialFeatures() 函数获取输出 numpy 数组的 header ？假设我有以下代码... import pan
scikit-learn - 无法在 Scikit-learn 中导入 PolynomialFeatures、make_pipeline
我无法在 ipython 笔记本中导入以下模块: from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline
python - 使用 PolynomialFeatures 和 LinearRegression 绘制 n 次预测线，当 n > 1 时无法正常工作
对于 MRE: m = 100 X = 6*np.random.rand(m,1)-3 y = 0.5*X**2 + X+2 + np.random.randn(m,1) lin_reg = Line

首页

博学

6Ren·AI

商城

python - Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道