Python 并发.futures : handling exceptions in child processes-6ren

Python 并发.futures : handling exceptions in child processes

转载作者：行者123 更新时间：2023-11-28 21:39:09

25

4

我有一个非常普通的 concurrent.futures.ProcessPoolExecutor 实现——类似于(使用 Python 3.6):

files = get_files()
processor = get_processor_instance()
with concurrent.futures.ProcessPoolExecutor() as executor:
    list(executor.map(processor.process, files))

虽然 processor 是许多可用处理器类中的任何一个的实例，但它们都共享 process 方法，大致如下所示:

def process(self, file):
    log.debug(f"Processing source file {file.name}.")
    with DBConnection(self.db_url) as session:
        file = session.merge(file)
        session.refresh(file)
        self._set_file(file)
        timer = perf_counter()
        try:
            self.records = self._get_records()
            self._save_output()
        except Exception as ex:
            log.warning(f"Failed to process source file {file.ORIGINAL_NAME}: {ex}")
            self.error_time = time.time()
            self.records = None
        else:
            process_duration = perf_counter() - timer
            log.info(f'File {file.name} processed in {process_duration:.6f} seconds.')
            file.process_duration = process_duration
        session.commit()

_get_records 和_save_output 方法的实现因类而异，但我的问题是错误处理。我故意测试它，以便这两种方法中的一种耗尽内存，但我希望上面的 except block 能够捕获它并移动下一个文件——这正是发生的情况当我在单个进程中运行代码时。

如果我如上所述使用 ProcessPoolExecutor，它会引发 BrokenProcessPool 异常并终止所有执行:

Traceback (most recent call last):
  File "/vagrant/myapp/myapp.py", line 94, in _process
    list(executor.map(processor.process, files))
  File "/home/ubuntu/.pyenv/versions/3.6.3/lib/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/ubuntu/.pyenv/versions/3.6.3/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/home/ubuntu/.pyenv/versions/3.6.3/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/home/ubuntu/.pyenv/versions/3.6.3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

我当然可以在调用代码中捕获 BrokenProcessPool，但我更愿意在内部处理错误并继续处理下一个文件。

我还尝试使用标准的 multiprocessing.Pool 对象，如下所示:

with multiprocessing.Pool() as pool:
    pool.map(processor.process, files)

在这种情况下，行为甚至更奇怪:在开始处理引发内存不足错误的前两个文件后，它会继续处理后面的文件，这些文件较小，因此会被完全处理。然而，except block 显然从未被触发(没有日志消息，没有 error_time)，应用程序只是挂起，既没有完成也没有做任何事情，直到被手动杀死。

我希望 try..except block 能让每个进程独立，处理自己的错误而不影响主应用程序。有什么想法可以实现吗？

最佳答案

因此，经过大量调试(并感谢@RomanPerekhrest 关于检查 executor 对象的建议)，我找到了原因。如问题中所述，测试数据由许多文件组成，其中两个文件非常大(每个文件超过 100 万行 CSV)。这两个都导致我的测试机器(一个 2GB VM)阻塞，但方式不同——而第一个更大，导致常规内存不足错误，该错误将由 except 处理，第二个简单地导致了 sigkill。在不探索太多的情况下，我怀疑较大的文件根本无法在读取时放入内存(在 _get_records 方法中完成)，而较小的文件可以，但随后对其进行操作(在完成_保存输出) caused the overflow并终止了进程。

我的解决方案是简单地捕获 BrokenProcessPool 异常并通知用户这个问题；我还添加了一个在一个进程中运行处理任务的选项，在这种情况下，任何太大的文件都被简单地标记为有错误:

files = get_files()
processor = get_processor_instance()
results = []
if args.nonconcurrent:
    results = list(map(processor.process, files))
else:
    with concurrent.futures.ProcessPoolExecutor() as executor:
        try:
            results = list(executor.map(processor.process, files))
        except concurrent.futures.process.BrokenProcessPool as ex:
            raise MyCustomProcessingError(
                f"{ex} This might be caused by limited system resources. "
                "Try increasing system memory or disable concurrent processing "
                "using the --nonconcurrent option."
            )

关于Python 并发.futures : handling exceptions in child processes，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47237112/

25

4

0

文章推荐： video - 停止 youtube 视频自动播放

文章推荐： python - re.VERBOSE 和先行断言错误

文章推荐： ios - Kivy 使用 PyObjus 开发 native iOS 应用程序的指南

文章推荐： video - 如何使用 webpack 在 React 中加载本地视频？

javascript - 如果 doc 有 child ，请删除所有 child ，如果 child 有更多 child ，请删除这些 child
我的收藏具有以下结构 { _id:1, parent_id:0 } { _id:2, parent_id:1 } { _id:3, parent_id:1 } { _id:4, par
c# - 在Unity3d中获取所有 child ， child 的 child
到目前为止，我已经尝试过获取该对象的所有子对象，但它只带来了两个子对象。不都是 child 的 child 。我如何获取所有内容并循环获取特定名称对象 Transform[] objChild = g
jquery获取 child 的 child 的 child
这个问题不太可能对任何 future 的访客有帮助；它只与一个较小的地理区域、一个特定的时间点或一个非常狭窄的情况相关，通常不适用于全世界的互联网受众。如需帮助使此问题更广泛适用，visit the
mysql - 需要在轨道上排列自己、 child 和 child 的 child
我有一个如下表好吧，在这个表中每个用户都有一个父用户，那么如果我们选择一个用户，那么它的 id 、子代 id 和子代子代 id 应该作为数组返回。我需要一个查询来获取 Rails 中的这些值，而不使
javascript - Jquery child>Parent>child>child 选择器
我需要以下代码的帮助: HTML: process process 在点击 td[class=process] 时，我需要 input[name=dat
php - 第一个 child 最后一个 child 和中间的 child
好的，所以我从中获得了一个 PHP，该 PHP 由依赖于手头动态情况的切换循环传播(我认为)。现在，当我添加一个复选框时，我希望能够使 div 中的第一个复选框具有顶部边框和侧面，没有底部。下面的只有
swift - Swift 中的 Sprite Kit Child of Child of Child
我正在使用 Swift 和 Sprite Kit。我有一个名为 MrNode 的 SKNode，它有多个 SKSpriteNodes 和 SKNode 子节点。一些SKNode有子节点，而这些子节点也
sql - 获取 child 关系的计数和 child 的 child 的计数
对不起，这个标题太俗了，但我真的不确定如何解释这个，我是新一代的 SQL 技能由于事件记录模式而退化的人之一! 基本上我在 PostgreSQL 中有三个表客户端(一个客户端有很多 map ) -
php - 找到没有 child 的行，如果有 child 最新的 child
我有这样的简单表格: 编号 parent_id 创建于具有父/子关系...如果一行是子行，则它有一个 parent_id，否则它的 parent_id 为 0。现在我想选择所有没有子项(因此本身)
javascript - 删除具有所述 child ID 的 child 的 child
所以我有这样的结构: 我的问题是:如何从每个主题中删除 ID 为 3Q41X2tKUMUmiDjXL1BJon70l8n2 的每个字段。我正在考虑这样的事情: admin.database().ref
html - CSS - 模糊所有的 child ，除了悬停的 child ，当鼠标悬停在 child 身上时
这个问题在这里已经有了答案: Change opacity on all elements except hovered one (1 个回答) 关闭 5 个月前。因此，当鼠标悬停在 child
德尔福快速报告 : Band order to achieve detail-child-child-child banding?
我需要在 Delphi 5 中创建一个 QuickReport，其布局如下: +================ | Report Header +================ +========
html - 如何在不提及第一个 child 的情况下在 css 中定义 child 的 child ？
假设我有这样的 html: Some more detailed code.... 我想知道如何在CSS中使用“A
html - 我怎样才能包装 flexbox child ，以便多个 child 堆叠在另一个 child 旁边？
我有一个使用 flexbox 的类似表格的布局: +--------------+---------------+-----------------+---------------+ | 1
ruby-on-rails - Rails - 同时创建 child 和 child 的 child
我有一个关联，其中 user has_many user_items 和 user_items has_many user_item_images。与一个已经退出的用户。我可以创建一个新的 user_
html - 第 n 个 child ()还是第一个 child ？如何选择第一个和第二个 child
我想选择无序列表中的前两个列表项。我可以这样选择第一项: ul li:nth-child(1) a { background: none repeat scroll 0 0 beige; }
css - 当只有一个 child 可用时，最后一个 child 样式会覆盖第一个 child 样式
ul li:first-child a { border-radius: 5px 5px 0 0; } ul li:last-child a { border-radius: 0 0 5p
html - 如何选择没有 firstChild 或 nth-child() 的 child 的 child ？
我有一个这样的表:
c++ - "child control"、 "child window"和 "child window control"之间有区别吗？
或者这些术语用于指代同一事物？我正在尝试在我的 Win32 应用程序中实现一些显示位图图像的自定义按钮。一个教程指出我应该使用 CreateWindow() 创建子窗口。但是，我已经从另一个关于创
javascript - div parent 和两个 child ，在其他 child 中获取一个 child 的id
我想在 jquery 中获取我的 svg 的 id，我尝试了这个 jquery，但它是未定义的。 $(event.target).children('svg').attr("id") Target.e

首页

博学

6Ren·AI

商城

Python 并发.futures : handling exceptions in child processes