gpt4 book ai didi

machine-learning - Scikit Learn - 使用 GridSearchCV 训练新模型

转载 作者:行者123 更新时间:2023-11-30 08:50:58 27 4
gpt4 key购买 nike

如果我使用 GridSearchCV 和管道获得最佳参数,是否可以保存训练后的模型,以便将来我可以将整个管道调用到新数据并为其生成预测?例如,我有以下管道,后跟参数的 gridsearchcv:

pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(SVC(probability=True))),
])

parameters = {
'vect__ngram_range': ((1, 1),(1, 2),(1,3)), # unigrams or bigrams
'clf__estimator__kernel': ('rbf','linear'),
'clf__estimator__C': tuple([10**i for i in range(-10,11)]),
}

grid_search = GridSearchCV(pipeline,parameters,n_jobs=-1,verbose=1)

print("Performing grid search...")
print("pipeline:", [name for name, _ in pipeline.steps])
print("parameters:")
pprint(parameters)
t0 = time()
#Conduct the grid search
grid_search.fit(X,y)
print("done in %0.3fs" % (time() - t0))
print()

print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters set:")
#Obtain the top performing parameters
best_parameters = grid_search.best_estimator_.get_params()
#Print the results
for param_name in sorted(parameters.keys()):
print("\t%s: %r" % (param_name, best_parameters[param_name]))

现在我想将所有这些步骤保存到一个流程中,以便我可以将其应用到一个新的、看不见的数据集,它将使用相同的参数、矢量化器和转换器来转换、实现并报告结果?

最佳答案

您可以只pickleGridSearchCV对象来保存它,然后当您想用它来预测新数据时取消pickle它。

import pickle

# Fit model and pickle fitted model
grid_search.fit(X,y)
with open('/model/path/model_pickle_file', "w") as fp:
pickle.dump(grid_search, fp)

# Load model from file
with open('/model/path/model_pickle_file', "r") as fp:
grid_search_load = pickle.load(fp)

# Predict new data with model loaded from disk
y_new = grid_search_load.best_estimator_.predict(X_new)

关于machine-learning - Scikit Learn - 使用 GridSearchCV 训练新模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25405844/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com