gpt4 book ai didi

python - scikit-learn 中的目标转换和特征选择

转载 作者:行者123 更新时间:2023-12-02 07:10:11 24 4
gpt4 key购买 nike

我正在使用 RFECV 在 scikit-learn 中进行特征选择。我想将简单线性模型 (X,y) 的结果与对数转换模型(使用 X, log(y))的结果进行比较

简单模型:RFECVcross_val_score 提供相同的结果(我们需要将所有折叠的交叉验证平均得分与所有折叠的 RFECV 得分进行比较特征:0.66 = 0.66,没问题,结果可靠)

日志模型:问题:似乎RFECV没有提供转换y的方法。本例中的分数为 0.550.53。但这是完全符合预期的,因为我必须手动应用 np.log 来适应数据:log_seletor = log_selector.fit(X,np.log(y))。这个 r2 分数适用于 y = log(y),没有 inverse_func,而我们需要的是一种将模型拟合到 log(y_train) 上的方法 并使用 exp(y_test) 计算分数。或者,如果我尝试使用 TransformedTargetRegressor,我会收到代码中显示的错误:分类器不会公开“coef_”或“feature_importances_”属性

如何解决问题并确保特征选择过程可靠?

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn import linear_model
from sklearn.model_selection import cross_val_score
from sklearn.compose import TransformedTargetRegressor
import numpy as np

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = linear_model.LinearRegression()
log_estimator = TransformedTargetRegressor(regressor=linear_model.LinearRegression(),
func=np.log,
inverse_func=np.exp)
selector = RFECV(estimator, step=1, cv=5, scoring='r2')
selector = selector.fit(X, y)
###
# log_selector = RFECV(log_estimator, step=1, cv=5, scoring='r2')
# log_seletor = log_selector.fit(X,y)
# #RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
###
log_selector = RFECV(estimator, step=1, cv=5, scoring='r2')
log_seletor = log_selector.fit(X,np.log(y))

print("**Simple Model**")
print("RFECV, r2 scores: ", np.round(selector.grid_scores_,2))
scores = cross_val_score(estimator, X, y, cv=5)
print("cross_val, mean r2 score: ", round(np.mean(scores),2), ", same as RFECV score with all features")
print("no of feat: ", selector.n_features_ )

print("**Log Model**")
log_scores = cross_val_score(log_estimator, X, y, cv=5)
print("RFECV, r2 scores: ", np.round(log_selector.grid_scores_,2))
print("cross_val, mean r2 score: ", round(np.mean(log_scores),2))
print("no of feat: ", log_selector.n_features_ )

输出:

**Simple Model**
RFECV, r2 scores: [0.45 0.6 0.63 0.68 0.68 0.69 0.68 0.67 0.66 0.66]
cross_val, mean r2 score: 0.66 , same as RFECV score with all features
no of feat: 6

**Log Model**
RFECV, r2 scores: [0.39 0.5 0.59 0.56 0.55 0.54 0.53 0.53 0.53 0.53]
cross_val, mean r2 score: 0.55
no of feat: 3

最佳答案

您所需要做的就是将此类属性添加到 TransformedTargetRegressor:

class MyTransformedTargetRegressor(TransformedTargetRegressor):
@property
def feature_importances_(self):
return self.regressor_.feature_importances_

@property
def coef_(self):
return self.regressor_.coef_

然后在您的代码中使用:

log_estimator = MyTransformedTargetRegressor(regressor=linear_model.LinearRegression(),
func=np.log,
inverse_func=np.exp)

关于python - scikit-learn 中的目标转换和特征选择,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58155778/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com