gpt4 book ai didi

r - 在 R 中运行并行计算时如何在 worker 上设置 .libPaths(检查点)

转载 作者:行者123 更新时间:2023-12-05 01:42:11 26 4
gpt4 key购买 nike

我使用检查点包进行可重现的数据分析。有些计算需要很长时间才能计算,所以我想并行运行这些计算。但是当并行运行时,检查点没有设置在 worker 上,所以我收到一条错误消息 “没有名为 xy 的包”(因为它没有安装在我的默认库目录中)。

我如何确保每个工作人员都使用检查点文件夹中的包版本?我试图在 foreach 代码中设置 .libPaths 但这似乎不起作用。我还希望在全局范围内而不是在每个 foreach 调用中设置一次检查点/libPaths。

另一种选择是更改 .Rprofile 文件,但我不想这样做。

checkpoint::checkpoint("2018-06-01")

library(foreach)
library(doFuture)
library(future)

doFuture::registerDoFuture()
future::plan("multisession")

l <- .libPaths()

# Code to run in parallel does not make much sense of course but I wanted to keep it simple.
res <- foreach::foreach(
x = unique(iris$Species),
lib.path = l
) %dopar% {
.libPaths(lib.path)
stringr::str_c(x, "_")
}

Error in { : task 2 failed - "there is no package called 'stringr'"

最佳答案

future 的作者在这里打包。

2022-05-25 更新:自 future 1.20.0 (2021-11-03) 起,多 session 并行工作器自动继承 R 库路径(=.libPaths()) 来自主 R session 。因此,不再需要以下解决方法。但是, future 的其他后端可能仍需要它。


将 master R 进程的库路径作为全局变量 libs 传递,并使用 .libPaths(libs) 为每个 worker 设置它就足够了;

## Use CRAN checkpoint from 2018-07-24 to get future (>= 1.9.0) [1],
## otherwise the below stdout won't be relayed back to the master
## R process, but settings .libPaths() does also work in older
## versions of the future package.
## [1] https://cran.microsoft.com/snapshot/2018-07-24/web/packages/future
checkpoint::checkpoint("2018-07-24")
stopifnot(packageVersion("future") >= "1.9.0")

libs <- .libPaths()
print(libs)
### [1] "/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1"
### [2] "/home/hb/.checkpoint/R-3.5.1"
### [3] "/usr/lib/R/library"

library(foreach)

doFuture::registerDoFuture()
future::plan("multisession")

res <- foreach::foreach(x = unique(iris$Species)) %dopar% {
## Use the same library paths as the master R session
.libPaths(libs)

cat(sprintf("Library paths used by worker (PID %d):\n", Sys.getpid()))
cat(sprintf(" - %s\n", sQuote(.libPaths())))

stringr::str_c(x, "_")
}

### - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
### - ‘/home/hb/.checkpoint/R-3.5.1’
### - ‘/usr/lib/R/library’
### Library paths used by worker (PID 9394):
### - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
### - ‘/home/hb/.checkpoint/R-3.5.1’
### - ‘/usr/lib/R/library’
### Library paths used by worker (PID 9412):
### - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
### - ‘/home/hb/.checkpoint/R-3.5.1’
### - ‘/usr/lib/R/library’

str(res)
### List of 3
### $ : chr "setosa_"
### $ : chr "versicolor_"
### $ : chr "virginica_"

仅供引用,它在未来的路线图上 make it easier to pass down the library path(s) to workers .

我的详细信息:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] foreach_1.4.4

loaded via a namespace (and not attached):
[1] drat_0.1.4 compiler_3.5.1 BiocManager_1.30.2 parallel_3.5.1 tools_3.5.1 listenv_0.7.0 doFuture_0.6.0
[8] codetools_0.2-15 iterators_1.0.10 digest_0.6.15 globals_0.12.1 checkpoint_0.4.5 future_1.9.0

关于r - 在 R 中运行并行计算时如何在 worker 上设置 .libPaths(检查点),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52276088/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com