gpt4 book ai didi

r-caret - 插入符不并行运行

转载 作者:行者123 更新时间:2023-12-02 12:06:59 24 4
gpt4 key购买 nike

实际的并行插入符取决于 R、插入符和 doMC 软件包。如 Parallelizing Caret code 中所述

有人和我在类似的环境中工作吗? R 插入符并行化正常工作的最大 R 版本是什么?

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] caret_6.0-52 ggplot2_1.0.1 lattice_0.20-31 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 RStudioAMI_0.2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 magrittr_1.5 splines_3.2.1 MASS_7.3-41 munsell_0.4.2 colorspace_1.2-6
[7] minqa_1.2.4 car_2.1-0 stringr_1.0.0 plyr_1.8.3 tools_3.2.1 pbkrtest_0.4-2
[13] nnet_7.3-9 grid_3.2.1 gtable_0.1.2 nlme_3.1-120 mgcv_1.8-6 quantreg_5.19
[19] MatrixModels_0.4-1 gtools_3.5.0 lme4_1.1-9 digest_0.6.8 Matrix_1.2-0 nloptr_1.0.4
[25] reshape2_1.4.1 codetools_0.2-11 stringi_0.5-5 BradleyTerry2_1.0-6 scales_0.3.0 stats4_3.2.1
[31] SparseM_1.7 brglm_0.5-9 proto_0.3-10

更新1: 我的代码如下:

library(doMC) ; registerDoMC(cores=4)
library(caret)
classification_formula <- as.formula(paste("target" ,"~",
paste(names(m_input_data)[!names(m_input_data)=='target'],collapse="+")))

CVfolds <- 2
CVreps <- 5
ma_control <- trainControl(method = "repeatedcv",
number = CVfolds,
repeats = CVreps ,
returnResamp = "final" ,
classProbs = T,
summaryFunction = twoClassSummary,
allowParallel = TRUE,verboseIter = TRUE)
rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))
rf <- train(classification_formula , data = m_input_data , method = "rf", metric="ROC" ,trControl = ma_control, tuneGrid = rf_tuneGrid , ntree = 101)

更新2: 当我从命令行运行时,只有一个核心正在工作 当我从 Rstudio 运行这些脚本时,并行正在工作,因为我看到 4 通过 top 进行处理。但一秒钟后错误发生了:

  Error in names(resamples) <- gsub("^\\.", "", names(resamples)) : 
attempt to set an attribute on NULL

更新4:

您好,问题似乎出在已终止的 R session 中。每次启动 AWS 实例时,我都会运行 R 代码,现在刷新 R 引擎。现在,每次刷新 Rstudio 浏览器时,我都会执行 Session -> Restart R 。看来它运行了。我现在正在检查从 Ubuntu 命令行运行脚本是否相同。

一般情况下它会运行而没有完成。插入符号在数据级别上并行。这意味着它能够在不同的进程上处理每个重采样。但如果样本仍然很大(100,000/2(折叠数 = 2)X 2,000 个特征),这对于每个处理器单元来说可能很难完成。我说得对吗?

我认为并行性必须在算法级别。这意味着每个算法都可能在多个内核上运行。如果这样的算法实现在插入符号中可用???

最佳答案

我有 Linux 平台的最新版本,R 版本 3.2.2(2015-08-14,消防安全),并且并行化工作正常。您能否提供不能并行工作的代码。

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] kernlab_0.9-22 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 caret_6.0-52 ggplot2_1.0.1 lattice_0.20-33

loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 compiler_3.2.2 nloptr_1.0.4 plyr_1.8.3 tools_3.2.2 digest_0.6.8
[7] lme4_1.1-9 nlme_3.1-122 gtable_0.1.2 mgcv_1.8-7 Matrix_1.2-2 brglm_0.5-9
[13] SparseM_1.7 proto_0.3-10 BradleyTerry2_1.0-6 stringr_1.0.0 gtools_3.5.0 MatrixModels_0.4-1
[19] stats4_3.2.2 grid_3.2.2 nnet_7.3-10 minqa_1.2.4 reshape2_1.4.1 car_2.0-26
[25] magrittr_1.5 scales_0.3.0 codetools_0.2-11 MASS_7.3-43 splines_3.2.2 pbkrtest_0.4-2
[31] colorspace_1.2-6 quantreg_5.18 stringi_0.5-5 munsell_0.4.2

我已在本地计算机上将您的代码用于 BreastCancer 数据集,并且它可以并行运行,没有任何问题。我使用的是 RStudio 版本 0.98.1103。

library(caret)
library(mlbench)
data(BreastCancer)

library(doMC)
registerDoMC(cores=2)

classification_formula <- as.formula(paste("Class" ,"~",
paste(names(BreastCancer)[!names(BreastCancer)=='Class'],collapse="+")))

CVfolds <- 2
CVreps <- 5
ma_control <- trainControl(method = "repeatedcv",
number = CVfolds,
repeats = CVreps ,
returnResamp = "final" ,
classProbs = T,
summaryFunction = twoClassSummary,
allowParallel = TRUE,verboseIter = TRUE)

rf_tuneGrid = expand.grid(mtry = seq(2,32, length.out = 6))

#Notice, it might be easier just to use Class~.
#instead of classification_formula
rf <- train(classification_formula ,
data = BreastCancer ,
method = "rf",
metric="ROC" ,
trControl = ma_control,
tuneGrid = rf_tuneGrid ,
ntree = 101)

> rf
Random Forest

699 samples
10 predictors
2 classes: 'benign', 'malignant'

No pre-processing
Resampling: Cross-Validated (2 fold, repeated 5 times)
Summary of sample sizes: 341, 342, 342, 341, 342, 341, ...
Resampling results across tuning parameters:

mtry ROC Sens Spec ROC SD Sens SD Spec SD
2 0.9867820 1.0000000 0.0000000 0.005007691 0.000000000 0.000000000
8 0.9899107 0.9549550 0.9640196 0.002243649 0.006714919 0.017247716
14 0.9907072 0.9558559 0.9631933 0.003028258 0.012345228 0.008019979
20 0.9909514 0.9635135 0.9556513 0.003268291 0.006864342 0.010471005
26 0.9911480 0.9630631 0.9539706 0.003384987 0.005113930 0.010628533
32 0.9911485 0.9657658 0.9522969 0.002973508 0.004842197 0.004090206

ROC was used to select the optimal model using the largest value.
The final value used for the model was mtry = 32.
>

关于r-caret - 插入符不并行运行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32514370/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com