gpt4 book ai didi

python - 打印 SelectKBest 的特征名称,其中 k 值位于 GridSearchCV 的 param_grid 内

转载 作者:行者123 更新时间:2023-11-30 09:09:04 26 4
gpt4 key购买 nike

我尝试了 param_grid 中来自 SelectKBest 的 k 和 PCA 的 n_components 的参数组合。我可以使用下面的代码打印k值n_components。我发布了完整的代码,以便您了解功能是从哪个列表中获取的

#THE FIRST FEATURE HAS TO BE THE LABEL

featurelist = ['poi', 'exercised_stock_options', 'expenses', 'from_messages',
'from_poi_to_this_person', 'from_this_person_to_poi', 'other',
'restricted_stock', 'salary', 'shared_receipt_with_poi',
'to_messages', 'total_payments', 'total_stock_value',
'ratio_from_poi', 'ratio_to_poi']

enronml = pd.DataFrame(enron[['poi', 'exercised_stock_options', 'expenses', 'from_messages',
'from_poi_to_this_person', 'from_this_person_to_poi', 'other',
'restricted_stock', 'salary', 'shared_receipt_with_poi',
'to_messages', 'total_payments', 'total_stock_value',
'ratio_from_poi', 'ratio_to_poi']].copy())


enronml = enronml.to_dict(orient="index")
dataset = enronml

#featureFormat, takes the dictionary as the dataset, converts the first
feature in featurelist into label

data = featureFormat(dataset, featurelist, sort_keys = True)
labels, features = targetFeatureSplit(data)

from sklearn.cross_validation import train_test_split
from sklearn.naive_bayes import GaussianNB

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(features, labels,
test_size=0.20, random_state=0)


pca = PCA()
gnba = GaussianNB()
steps = [('scaler', MinMaxScaler()),
('best', SelectKBest()),
('pca', pca),
('gnba', gnba)]

pipeline = Pipeline(steps)

parameters = [
{
'best__k':[3],
'pca__n_components': [1,2]
},
{
'best__k':[4],
'pca__n_components': [1,2,3]
},
{
'best__k':[5],
'pca__n_components': [1,2,3,4]
},
]

cv = StratifiedShuffleSplit(test_size=0.2, random_state=42)
gnbawithpca = GridSearchCV(pipeline, param_grid = parameters, cv=cv,
scoring="f1")
gnbawithpca.fit(X_train,y_train)

means = gnbawithpca.cv_results_['mean_test_score']
stds = gnbawithpca.cv_results_['std_test_score']


for mean, std, params in zip(means, stds,
gnbawithpca.cv_results_['params']):
print("%0.3f (+/-%0.03f) for %r"
% (mean, std * 2, params))

我能够得到这样的结果

0.480 (+/-0.510) for {'best__k': 3, 'pca__n_components': 1}
0.534 (+/-0.409) for {'best__k': 3, 'pca__n_components': 2}
0.480 (+/-0.510) for {'best__k': 4, 'pca__n_components': 1}
0.534 (+/-0.409) for {'best__k': 4, 'pca__n_components': 2}
0.565 (+/-0.342) for {'best__k': 4, 'pca__n_components': 3}
0.480 (+/-0.510) for {'best__k': 5, 'pca__n_components': 1}
0.513 (+/-0.404) for {'best__k': 5, 'pca__n_components': 2}
0.473 (+/-0.382) for {'best__k': 5, 'pca__n_components': 3}
0.448 (+/-0.353) for {'best__k': 5, 'pca__n_components': 4}

我想知道选择了哪些特征,例如,当 best_k = 5 时,我想知道这 5 个特征的名称。

最佳答案

已解决

定义要在 GridSearchCV 中使用的管道时,您可以命名每个步骤:

steps = [('scaler', MinMaxScaler()),
('best', SelectKBest()),
('pca', pca),
('gnba', gnba)]

pipeline = Pipeline(steps)

您这样做有两个原因:

因此,您可以在参数网格中定义参数(需要名称来标识您要为其定义参数的步骤)。

因此,您可以从 GridSearchCV 对象访问步骤的属性(这回答了您的问题)。

skb_step = gnbawithpca.best_estimator_.named_steps['best']

# Get SelectKBest scores, rounded to 2 decimal places, name them "feature_scores"

feature_scores = ['%.2f' % elem for elem in skb_step.scores_ ]

# Get SelectKBest pvalues, rounded to 3 decimal places, name them "feature_scores_pvalues"

feature_scores_pvalues = ['%.3f' % elem for elem in skb_step.pvalues_
]

# Get SelectKBest feature names, whose indices are stored in 'skb_step.get_support',

# create a tuple of feature names, scores and pvalues, name it "features_selected_tuple"

features_selected_tuple=[(featurelist[i+1], feature_scores[i],
feature_scores_pvalues[i]) for i in skb_step.get_support(indices=True)]

# Sort the tuple by score, in reverse order

features_selected_tuple = sorted(features_selected_tuple, key=lambda
feature: float(feature[1]) , reverse=True)

# Print

print ' '
print 'Selected Features, Scores, P-Values'
print features_selected_tuple

关于python - 打印 SelectKBest 的特征名称,其中 k 值位于 GridSearchCV 的 param_grid 内,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44999289/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com