gpt4 book ai didi

Distributed package doesn‘t have NCCL built in

转载 作者:知者 更新时间:2024-03-13 00:47:30 27 4
gpt4 key购买 nike

Distributed package doesn't have NCCL built in

问题描述:
python在windows环境下dist.init_process_group(backend, rank, world_size)处报错‘RuntimeError: Distributed package doesn’t have NCCL built in’,具体信息如下:

File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 531, in init_process_group
    timeout=timeout)
  File "D:\Software\Anaconda\Anaconda3\envs\segmenter\lib\site-packages\torch\distributed\distributed_c10d.py", line 625, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in

原因分析:
windows不支持NCCL backend

解决方案:
在dist.init_process_group语句之前添加backend=‘gloo’,也就是在windows中使用GLOO替代NCCL。
————————————————
版权声明:本文为CSDN博主「StarCap」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/StarCap/article/details/120070425

insightface训练代码:

try:
    world_size =1# int(os.environ["WORLD_SIZE"])
    rank =0# int(os.environ["RANK"])
    # distributed.init_process_group("nccl")
    distributed.init_process_group("gloo")
except KeyError:
    world_size = 1
    rank = 0
    distributed.init_process_group(
        backend="nccl",
        init_method="tcp://127.0.0.1:12584",
        rank=rank,
        world_size=world_size,
    )

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com