parallel-processing - 分布式 Julia 中的弹性并行性和容错性-6ren

parallel-processing - 分布式 Julia 中的弹性并行性和容错性

转载作者：行者123 更新时间：2023-12-01 01:53:08

Julia 如何公开容错 - 当节点出现故障(有意或无意)以及节点之间的通信出现故障时。

我看到一个 few mentions有这样一个功能，但无法确切地知道它是如何完成的。

最佳答案

在 pmap 文档字符串中，您可以看到这已经使用 retry_ 实现了。关键字参数。

pmap([::AbstractWorkerPool], f, c...; distributed=true, batch_size=1,
on_error=nothing, retry_n=0, retry_max_delay=DEFAULT_RETRY_MAX_DELAY,
retry_on=DEFAULT_RETRY_ON) -> collection

... Any error stops pmap from processing the remainder of the collection. To override this behavior you can specify an error handling function via argument on_error which takes in a single argument, i.e., the exception. The function can stop the processing by rethrowing the error, or, to continue, return any value which is then returned inline with the results to the caller.

Failed computation can also be retried via retry_on, retry_n, retry_max_delay, which are passed through to retry as arguments retry_on, n and max_delay respectively. If batching is specified, and an entire batch fails, all items in the batch are retried.

我认为 @parallel 宏没有类似的东西。但是您可以使用 Base.wrap_on_error & Base.wrap_retry函数来扩展您的原始函数以处理错误。通过查看 pmap 的定义，您可以看到很多实现细节。在 https://github.com/JuliaLang/julia/blob/v0.5.0/base/pmap.jl .

基本策略只是捕获错误(可能还有数据)并使用同一个工作人员重试，如果该工作人员出现故障，则使用另一个工作人员重试。我认为。

关于parallel-processing - 分布式 Julia 中的弹性并行性和容错性，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42656455/

文章推荐： python - 使用python joblib调用并行类函数

文章推荐： python - ufunc 算术表达式中的内存消耗

文章推荐： tfs - VSTS 自定义 "Discussion"部分

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

parallel-processing - 分布式 Julia 中的弹性并行性和容错性