python - 令人尴尬的并行 for 循环，每次迭代中都有复杂的输出-6ren

python - 令人尴尬的并行 for 循环，每次迭代中都有复杂的输出

转载作者：太空宇宙更新时间：2023-11-04 04:58:43

我在 python 中有一个令人尴尬的并行 for 循环(要重复 n 次)，每次迭代执行一个复杂的任务并返回 numpy 数组和 dict 的混合(所以不是要填充的单个数字成一个数组 - 否则暂时将它们视为复杂的一堆)。重复不需要按任何特定顺序 - 我只需要能够唯一地识别 n 迭代中的每个 i (例如，在重复中独立保存结果).事实上，它们甚至不需要通过索引/计数器来识别，而是一种独特的东西，因为它们不需要排序(我可以轻松地将它们填回一个更大的数组。)

举一个更具体的例子，我想并行执行以下任务:

def do_complex_task(complex_input1, input2, input3, input_n):
  "all important computation done here - independent of i or n"

  inner_result1, inner_result2 = np.zeros(100), np.zeros(100)
  for smaller_input in complex_input1:
    inner_result1 = do_another_complex_task(smaller_input, input2, input3, input_n)
    inner_result2 = do_second_complex_task(smaller_input, input2, input3, input_n)

  # do some more to produce few more essential results
  dict_result = blah()

  unique_identifier = get_unique_identifier_for_this_thread() # I don't know how

  # save results for each repetition independently before returning, 
  # instead of waiting for full computation to be done which can take a while
  out_path = os.path.join(out_dir, 'repetition_{}.pkl'.format(unique_identifier))

  return inner_result1, inner_result2, inner_result_n, dict_result


def main_compute()
  "main method to run the loop"

  n = 256 # ideally any number, but multiples of 4 possible, for even parallelization.

  result1  = np.zeros([n, 100])
  result2  = np.zeros([n, 100])
  result_n = np.zeros([n, 100])
  dict_result = list()

  # this for loop does not need to be computed in any order (range(n) is an illustration)
  # although this order would be ideal, as it makes it easy to populate results into a bigger array
  for i in range(n):
    # this computation has nothing to do with i or n!
    result1[i, :], result2[i, :], result_n[i, :], dict_result[i] = do_complex_task(complex_input1, input2, input3, input_n)

  # I need to parallelize the above loop to speed up stupidly parallel processing.


if __name__ == '__main__':
    pass

我已经相当广泛地阅读了，但不清楚哪种策略更聪明、更简单，而且没有任何可靠性问题。

此外，complex_input1 可能很大 - 所以我不希望酸洗带来大量 I/O 开销。

我当然可以返回一个列表(包含所有复杂部分)，该列表附加到主列表，稍后可以将其组合成我喜欢的格式(矩形数组等)。这可以通过 joblib 轻松完成。例如。不过，我正努力向大家学习，以确定好的解决方案。

编辑:我想我正在解决以下解决方案。让我知道它可能出什么问题，或者我怎样才能在速度、无副作用等方面进一步改进它。在我的笔记本电脑上进行了几次非结构化试验后，不清楚是否有明显的加速由于这个。

from multiprocessing import Pool, Manager
chunk_size = int(np.ceil(num_repetitions/num_procs))
with Manager() as proxy_manager:
    shared_inputs = proxy_manager.list([complex_input1, input2, another, blah])
    partial_func_holdout = partial(key_func_doing_work, *shared_inputs)

    with Pool(processes=num_procs) as pool:
        results = pool.map(partial_func_holdout, range(num_repetitions), chunk_size)

最佳答案

multiprocessing.Pool.map

形式的内置解决方案

import multiprocessing
from functools import partial

def do_task(a, b):
    return (42, {'x': a * 2, 'y': b[::-1]})

if __name__ == '__main__':
    a_values = ['Hello', 'World']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.map(partial(do_task, b='fixed b value'), a_values)
    print(results)

在此之后，results 将包含与 a_values 相同顺序的结果。

要求是参数和返回值是 Pickle'able 的。除此之外，它们可能很复杂，但如果数据量很大，可能会有一些性能损失。

我不知道这是否是您认为好的解决方案；我已经使用了很多次，对我来说效果很好。

您可以将返回值放在一个类中，但我个人认为这并没有真正带来好处，因为 Python 没有静态类型检查。

它只是并行启动 #processes 个作业。它们应该是独立的，顺序无关紧要(我认为它们是按提供的顺序开始的，但它们可能以另一个顺序完成)。

示例基于 this answer .

关于python - 令人尴尬的并行 for 循环，每次迭代中都有复杂的输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46391191/

文章推荐： c - 如何打印字符串中字符的频率？

文章推荐： c - 记录事件-寻找好方法

文章推荐：我们可以为 gcc 上的特定功能添加 `-fcall-used-REG` 吗？

f# - 负载平衡请求/农场请求(并发和状态 - 尴尬)
我想为进入 C# web api 应用程序的一些请求编写一个简单的负载均衡器。 (我只使用 C# 的东西作为创建 Web 服务器的便捷方式)。解决这个问题的最佳方法是什么？ (我还没有真正在 F#
postgresql - 尴尬/错误的 PostgreSQL 外键定义
作为一名数据库开发人员，当我尝试将仅数据转储到 PostgreSQL(10.1) 数据库“tlesson”时，我遇到了这个通知。通知=> pg_dump: NOTICE: there are c

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 令人尴尬的并行 for 循环，每次迭代中都有复杂的输出