gpt4 book ai didi

python - 使用 pickle 保存模型

转载 作者:行者123 更新时间:2023-12-05 02:47:09 28 4
gpt4 key购买 nike

我已经构建了一个分类器,我想保存它以备将来使用。分类器包括不同的算法(逻辑回归、朴素贝叶斯、支持向量机):

X, y = tfidf(df, ngrams = 1)
X, y = under_sample.fit_resample(X, y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=40)
df_result = df_result.append(training_naive(X_train, X_test, y_train, y_test), ignore_index = True)
df_result = df_result.append(training_logreg(X_train, X_test, y_train, y_test), ignore_index = True)
df_result = df_result.append(training_svm(X_train, X_test, y_train, y_test), ignore_index = True)

这是我代码的最后一步,我在这里比较不同的算法。training_svm/logreg 和 naive 是函数。例如training_svm,定义如下:

def training_svm(X_train_log, X_test_log, y_train_log, y_test_log):

folds = StratifiedKFold(n_splits = 3, shuffle = True, random_state = 40)

clf = svm.SVC(kernel='linear') # Linear Kernel

clf.fit(X_train_log, y_train_log)

res = pd.DataFrame(columns = ['Preprocessing', 'Model', 'Precision', 'Recall', 'F1-score', 'Accuracy'])

y_pred = clf.predict(X_test_log)

f1 = f1_score(y_pred, y_test_log, average = 'weighted')
pres = precision_score(y_pred, y_test_log, average = 'weighted')
rec = recall_score(y_pred, y_test_log, average = 'weighted')
acc = accuracy_score(y_pred, y_test_log)

res = res.append({'Model': f'SVM', 'Precision': pres,
'Recall': rec, 'F1-score': f1, 'Accuracy': acc}, ignore_index = True)

return res

因为我想用新数据来使用和测试它,所以我想知道如何保存它并重新使用它。我会说我应该做这样的事情

import pickle

# save
with open('model.pkl','wb') as f:
pickle.dump(clf,f)

# load
with open('model.pkl', 'rb') as f:
clf2 = pickle.load(f)

clf2.predict(X[0:1])

能否解释一下如何将其扩展到我的项目中?

最佳答案

如 sklearn 所述:

It is possible to save a model in scikit-learn by using Python’sbuilt-in persistence model, namely pickle

例子:

from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
X, y= datasets.load_iris(return_X_y=True)
clf.fit(X, y)

import pickle
s = pickle.dumps(clf)
clf2 = pickle.loads(s)
clf2.predict(X[0:1])

然后您可以将它包含在每个模型的代码中,创建一个函数调用

def predict_svm(to_predict):
with open("'your_svm_model'",'rb') as f_input:
clf = pickle.loads(f_input) # maybe handled with a singleton to reduce loading for multiple predictions
return clf.predict(to_predict)

无论如何,sklearn 建议使用joblib:

In the specific case of scikit-learn, it may be better to usejoblib’s replacement of pickle (dump & load), which is more efficienton objects that carry large numpy arrays internally as is often thecase for fitted scikit-learn estimators, but can only pickle to thedisk and not to a string:

from joblib import dump, load
dump(clf, 'filename.joblib')

clf = load('filename.joblib')

Details here

关于python - 使用 pickle 保存模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65152886/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com