gpt4 book ai didi

python - Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道

转载 作者:行者123 更新时间:2023-12-05 03:37:05 24 4
gpt4 key购买 nike

我有以下模型,它缩放数据,然后使用多项式特征,最后将数据输入具有正则化的回归模型,如下所示:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) 

scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

polynomial = PolynomialFeatures(degree=3, include_bias=False)
polynomial.fit(X_train_scaled)

X_train_model = polynomial.transform(X_train_scaled)
X_test_model = polynomial.transform(X_test_scaled)

reg_model = Ridge(alpha=alpha)
reg_model.fit(X_train_model, y_train)

y_pred_train_model = reg_model.predict(X_train_model)
r2_train = r2_score(y_train, y_pred_train_model)

y_pred_test_model = reg_model.predict(X_test_model)
r2_test = r2_score(y_test, y_pred_test_model)

它工作正常,但对于许多适合和转换来说似乎有点麻烦。我在 sklearn 中听说过这个 Pipeline() 方法。如何在上面使用它来简化流程?

最佳答案

您可以使用 Pipeline() 重写您的代码,如下所示:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline

# generate the data
X, y = make_regression(n_samples=1000, n_features=100, noise=10, bias=1, random_state=42)

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# define the pipeline
pipe = Pipeline(steps=[
('scaler', StandardScaler()),
('preprocessor', PolynomialFeatures(degree=3, include_bias=False)),
('estimator', Ridge(alpha=1))
])

# fit the pipeline
pipe.fit(X_train, y_train)

# generate the model predictions
y_pred_train_pipe = pipe.predict(X_train)
print(y_pred_train_pipe[:5])
# [11.37182811 89.22027129 -106.51012773 79.5912864 -241.0138516]

y_pred_test_pipe = pipe.predict(X_test)
print(y_pred_test_pipe[:5])
# [16.88238278 57.50116009 50.35705205 -20.92005052 -76.04156972]

# calculate the r-squared
print(pipe.score(X_train, y_train))
# 0.9999999999787197

print(pipe.score(X_test, y_test))
# 0.463044896596684

没有 Pipeline() 的等效代码:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score

# generate the data
X, y = make_regression(n_samples=1000, n_features=100, noise=10, bias=1, random_state=42)

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# scale the data
scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# extract the polynomial features
polynomial = PolynomialFeatures(degree=3, include_bias=False)
polynomial.fit(X_train_scaled)

X_train_model = polynomial.transform(X_train_scaled)
X_test_model = polynomial.transform(X_test_scaled)

# fit the model
reg_model = Ridge(alpha=1)
reg_model.fit(X_train_model, y_train)

# generate the model predictions
y_pred_train_model = reg_model.predict(X_train_model)
print(y_pred_train_model[:5])
# [11.37182811 89.22027129 -106.51012773 79.5912864 -241.0138516]

y_pred_test_model = reg_model.predict(X_test_model)
print(y_pred_test_model[:5])
# [16.88238278 57.50116009 50.35705205 -20.92005052 -76.04156972]

# calculate the r-squared
print(r2_score(y_train, y_pred_train_model))
# 0.9999999999787197

print(r2_score(y_test, y_pred_test_model))
# 0.463044896596684

关于python - Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69443936/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com