gpt4 book ai didi

scikit-learn - 管道: how determine feature names?中的python功能选择

转载 作者:行者123 更新时间:2023-12-04 04:16:12 24 4
gpt4 key购买 nike

我使用管道和grid_search选择最佳参数,然后使用这些参数来拟合最佳管道('best_pipe')。但是,由于feature_selection(SelectKBest)在管道中,因此没有适合SelectKBest的方法。

我需要知道“k”个选定特征的特征名称。有什么想法如何找回它们吗?先感谢您

from sklearn import (cross_validation, feature_selection, pipeline,
preprocessing, linear_model, grid_search)
folds = 5
split = cross_validation.StratifiedKFold(target, n_folds=folds, shuffle = False, random_state = 0)

scores = []
for k, (train, test) in enumerate(split):

X_train, X_test, y_train, y_test = X.ix[train], X.ix[test], y.ix[train], y.ix[test]

top_feat = feature_selection.SelectKBest()

pipe = pipeline.Pipeline([('scaler', preprocessing.StandardScaler()),
('feat', top_feat),
('clf', linear_model.LogisticRegression())])

K = [40, 60, 80, 100]
C = [1.0, 0.1, 0.01, 0.001, 0.0001, 0.00001]
penalty = ['l1', 'l2']

param_grid = [{'feat__k': K,
'clf__C': C,
'clf__penalty': penalty}]

scoring = 'precision'

gs = grid_search.GridSearchCV(estimator=pipe, param_grid = param_grid, scoring = scoring)
gs.fit(X_train, y_train)

best_score = gs.best_score_
scores.append(best_score)

print "Fold: {} {} {:.4f}".format(k+1, scoring, best_score)
print gs.best_params_
best_pipe = pipeline.Pipeline([('scale', preprocessing.StandardScaler()),
('feat', feature_selection.SelectKBest(k=80)),
('clf', linear_model.LogisticRegression(C=.0001, penalty='l2'))])

best_pipe.fit(X_train, y_train)
best_pipe.predict(X_test)

最佳答案

您可以按名称在best_pipe中访问功能选择器:

features = best_pipe.named_steps['feat']

然后,您可以在索引数组上调用 transform()以获取所选列的名称:
X.columns[features.transform(np.arange(len(X.columns)))]

此处的输出将是在管道中选择的80列名称。

关于scikit-learn - 管道: how determine feature names?中的python功能选择,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33376078/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com