gpt4 book ai didi

python - GridSearchCV 评分和 grid_scores_

转载 作者:太空宇宙 更新时间:2023-11-04 08:02:50 26 4
gpt4 key购买 nike

我正在尝试了解如何获取 GridSearchCV 的记分员值.下面的示例代码在文本数据上设置了一个小型管道。

然后它在不同的 ngram 上设置网格搜索。

评分是通过 f1 度量完成的:

#setup the pipeline
tfidf_vec = TfidfVectorizer(analyzer='word', min_df=0.05, max_df=0.95)
linearsvc = LinearSVC()
clf = Pipeline([('tfidf_vec', tfidf_vec), ('linearsvc', linearsvc)])

# setup the grid search
parameters = {'tfidf_vec__ngram_range': [(1, 1), (1, 2)]}
gs_clf = GridSearchCV(clf, parameters, n_jobs=-1, scoring='f1')
gs_clf = gs_clf.fit(docs_train, y_train)

现在我可以打印分数:

打印 gs_clf.grid_scores_

[mean: 0.81548, std: 0.01324, params: {'tfidf_vec__ngram_range': (1, 1)},
mean: 0.82143, std: 0.00538, params: {'tfidf_vec__ngram_range': (1, 2)}]

打印 gs_clf.grid_scores_[0].cv_validation_scores

array([ 0.83234714,  0.8       ,  0.81409002])

我不清楚 documentation :

  1. gs_clf.grid_scores_[0].cv_validation_scores 是一个数组,其分数通过评分参数定义,每折(在这种情况下,f1 测量每折)?如果不是,那又是什么?

  2. 如果我改为选择另一个 metric ,例如 scoring='f1_micro',gs_clf.grid_scores_[i].cv_validation_scores 中的每个数组都将包含用于特定网格搜索参数选择的折叠的 f1_micro 指标?

最佳答案

我编写了以下函数将 grid_scores_ 对象转换为 pandas.DataFrame。希望数据框 View 有助于消除您的困惑,因为它是一种更直观的格式:

def grid_scores_to_df(grid_scores):
"""
Convert a sklearn.grid_search.GridSearchCV.grid_scores_ attribute to a tidy
pandas DataFrame where each row is a hyperparameter-fold combinatination.
"""
rows = list()
for grid_score in grid_scores:
for fold, score in enumerate(grid_score.cv_validation_scores):
row = grid_score.parameters.copy()
row['fold'] = fold
row['score'] = score
rows.append(row)
df = pd.DataFrame(rows)
return df

你必须有以下导入才能工作:import pandas as pd

关于python - GridSearchCV 评分和 grid_scores_,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37014564/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com