r - 在 R 中使用 Caret 保存和加载 catboost 模型-6ren

r - 在 R 中使用 Caret 保存和加载 catboost 模型

转载作者：行者123 更新时间：2023-12-02 20:13:39

26

4

我能够使用插入符号(在 Rstudio 中)训练 Catboost 模型，并且效果很好。

my_catboost <- caret::train(x, y, 

              method=catboost.caret, 
              trControl=fitControl, 
              tuneGrid = param,
              metric = "ROC")

如果我使用该模型在同一 session 中预测新数据，没问题，它有效:

output <- caret::predict.train(my_catboost, newdata=x_testing, type="prob")

但是，如果我保存模型并稍后加载它(或保存它，删除“my_catboost”并加载)，函数预测将导致 R 和 Rstudio 崩溃而没有错误消息，并且在 Rstudio 日志中找不到任何内容。加载后，我可以在全局环境中看到正在创建的模型，看起来没问题。

我尝试了 R 函数保存和加载、saveRDS 和 readRDS，但都崩溃了

谢谢!

最佳答案

你误解了我的意思。这是使用内置数据集 Sonar 的答案:

library(caret)
library(catboost)
library(mlbench)
data(Sonar)

创建训练和测试数据集:

set.seed(1)

tr <- createDataPartition(Sonar$Class, p = 0.7, list = FALSE)

trainer <- Sonar[tr,]
tester <- Sonar[-tr,]

训练模型:

fitControl <- trainControl(method = "cv",
                           number = 3,
                           savePredictions = TRUE,
                           summaryFunction = twoClassSummary,
                           classProbs = TRUE)

model <- train(x = trainer[,1:60],
               y = trainer$Class,
               method = catboost.caret, 
               trControl = fitControl, 
               tuneLength = 5,
               metric = "ROC")

使用插入符预测:

preds1 <- predict(model, tester, type = "prob")

保存最终模型:

catboost::catboost.save_model(model$finalModel, "model")

加载保存的模型:

model2 <- catboost::catboost.load_model("model")

使用保存的模型进行预测:

preds2 <- catboost.predict(model2,
                           catboost.load_pool(tester),
                           prediction_type = "Probability")

检查预测的相等性

all.equal(preds1[,2], preds2)

编辑:同时:

saveRDS(model, "caret.model.rds")
model3 <- readRDS("caret.model.rds")
preds3 <- predict(model3, tester, type = "prob")

导致 R session 崩溃

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mlbench_2.1-1        catboost_0.10.3      caret_6.0-80         ggplot2_2.2.1        lattice_0.20-35      RevoUtils_11.0.0    
[7] RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] httr_1.3.1         magic_1.5-8        ddalpha_1.3.3      tidyr_0.8.1        sfsmisc_1.1-2      jsonlite_1.5      
 [7] viridisLite_0.3.0  splines_3.5.0      foreach_1.5.0      prodlim_2018.04.18 assertthat_0.2.0   stats4_3.5.0      
[13] DRR_0.0.3          yaml_2.1.19        robustbase_0.93-0  ipred_0.9-6        pillar_1.2.3       glue_1.2.0        
[19] digest_0.6.15      colorspace_1.3-2   recipes_0.1.2      htmltools_0.3.6    Matrix_1.2-14      plyr_1.8.4        
[25] psych_1.8.4        timeDate_3043.102  pkgconfig_2.0.1    CVST_0.2-2         broom_0.4.4        purrr_0.2.4       
[31] scales_0.5.0       gower_0.1.2        lava_1.6.1         tibble_1.4.2       withr_2.1.2        nnet_7.3-12       
[37] lazyeval_0.2.1     mnormt_1.5-5       survival_2.41-3    magrittr_1.5       nlme_3.1-137       MASS_7.3-49       
[43] dimRed_0.1.0       foreign_0.8-70     class_7.3-14       tools_3.5.0        data.table_1.11.4  stringr_1.3.1     
[49] plotly_4.7.1       kernlab_0.9-26     munsell_0.4.3      bindrcpp_0.2.2     compiler_3.5.0     RcppRoll_0.2.2    
[55] rlang_0.2.0        grid_3.5.0         iterators_1.0.10   htmlwidgets_1.2    geometry_0.3-6     gtable_0.2.0      
[61] ModelMetrics_1.1.0 codetools_0.2-15   abind_1.4-5        reshape2_1.4.3     R6_2.2.2           lubridate_1.7.4   
[67] dplyr_0.7.5        bindr_0.1.1        stringi_1.1.7      parallel_3.5.0     Rcpp_0.12.17       rpart_4.1-13      
[73] DEoptimR_1.0-8     tidyselect_0.2.4

关于r - 在 R 中使用 Caret 保存和加载 catboost 模型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52884421/

26

4

0

文章推荐： r - R 调查包中的多核参数

文章推荐： python - 查找列表中不构成配对的奇数

文章推荐： Eclipse JDK 11 QName 未找到

catboost - CatBoost 中的多类多标签分类
我需要使用 CatBoost 执行多类多标签分类。示例数据: X = [[1, 2, 3, 4], [2, 3, 5, 1], [4, 5, 1, 3]] y = [[3, 1], [2, 8],
catboost - 如何使用 catboost 过拟合检测器
我试图了解 catboost 过拟合检测器。它在这里描述: https://tech.yandex.com/catboost/doc/dg/concepts/overfitting-detector-
python - Catboost 预测返回错误特征在数据集中标记为不同
我用 catboost 训练了二元分类器，测试数据中的相同特征会返回此错误； catboost/libs/data/model_dataset_compatibility.cpp:47: Featur
python - Catboost 默认超参数
如何返回 CatBoost 模型的所有超参数？注意:我认为这不是 Print CatBoost hyperparameters 的复制品因为那个问题/答案不能满足我的需要。例如，使用 sklear
python - 贝叶斯优化应用于 CatBoost
这是我在 CatBoost 中应用 BayesSearch 的尝试: from catboost import CatBoostClassifier from skopt import BayesSe
python - catboost 分类器可以解决类别不平衡问题吗？
我正在为我的二元分类模型使用 catboost 分类器，其中我有一个高度不平衡的数据集:0 -> 115000 和 1 -> 10000。有人可以指导我如何在 catboostclassifier 中
python - Catboost plot_tree 理解
当从 catboost 绘制一棵树时，它在叶子中显示 val；这些值代表什么？我在他们关于绘图的官方教程中找不到答案，在我能找到的任何地方也找不到任何此类问题的答案。喜欢: LightGBM plo
catboost - 模型特征重要性和 SHAP 汇总图的差异
我一直在玩玩具数据集，以了解更多关于 shap 库和用法的信息。我发现这个问题是 catboost 回归模型的特征重要性与 shap 库中 summary_plot 的特征重要性不同。我正在分析 X
python - 如何获得 catboost 可视化以显示类别
考虑以下数据: import pandas as pd y_train = pd.DataFrame({0: {14194: 'Fake', 13891: 'Fake', 13247: 'Fake',
python - 如何抑制 CatBoost 迭代结果？
我正在尝试使用 CatBoost 来拟合二元模型。当我使用以下代码时，我想到了 verbose=False可以帮助抑制迭代日志。但它没有。有没有办法避免打印迭代？ model=CatBoostClas
python - 如何为 catboost 创建自定义评估指标？
类似的问题: Python Catboost: Multiclass F1 score custom metric Catboost 教程 https://catboost.ai/docs/conce
machine-learning - catboost 算法中对称树背后的直觉是什么？
我一直在研究 catboost 算法，我很难看出使用对称树的意义。在这方面，我在他们的github中找到了: An important part of the algorithm is that it
python - 打印 CatBoost 超参数
训练模型后如何打印 CatBoost 超参数？在 sklearn我们可以打印模型对象，它将显示所有参数，但在 catboost 中它只打印对象的引用: . from catboost import
python - Catboost 理解 - 分类值的转换
我有一些关于 catboost 的愚蠢问题。从catboost的文档中，我了解到行之间存在一些排列/洗牌，用于分类数据转换。( https://tech.yandex.com/catboost/do
python - Catboost 回归。函数外推
我是 ML 新手，对 catboost 有疑问。所以，我想预测函数值(例如 cos | sin 等)。我回顾了一切，但我的预测始终是直线是否可能，如果可能，我该如何解决我的问题我很高兴收到任何评论
machine-learning - CatBoost 基准测试中使用哪种预处理来编码分类变量？
我最近开始使用 CatBoost 来快速构建机器学习模型的原型(prototype)，受到杰出的 performance benchmarks 的启发。 CatBoost 与 XGBoost、Ligh
python - catboost:带有观察权重的评估/测试集
我正在处理一个包含人员列表(按财政代码索引)的数据集。目标变量是二进制的(1:买一本书，0:否则)。所有预测变量都是分类的(例如:国籍、城市、道路、收入类别等)。财政代码可以重复两次，每个实例/观察都
python - CATBoost 和 GridSearch
model.fit(train_data, y=label_data, eval_set=eval_dataset) eval_dataset = Pool(val_data, val_labels)
machine-learning - XGBoost/CatBoost 中具有大量类别的分类变量
我有一个关于随机森林的问题。想象一下，我有关于用户与项目交互的数据。项目数量很大，大约 10 000 个。我的随机森林输出应该是用户可能与之交互的项目(如推荐系统)。对于任何用户，我想使用一个描述用户
machine-learning - Catboost:l2_leaf_reg 的合理值是多少？
在大型数据集(约 1M 行，500 列)上运行 catboost，我得到:训练已停止(迭代 0 上的退化解，可能太小 l2 正则化，尝试增加它)。我如何猜测 l2 正则化值应该是多少？与y的平均值、

首页

博学

6Ren·AI

商城

r - 在 R 中使用 Caret 保存和加载 catboost 模型