gpt4 book ai didi

cuda - GPU 和远程主机之间的 RDMA

转载 作者:行者123 更新时间:2023-12-02 13:12:39 25 4
gpt4 key购买 nike

是否可以在 GPU 和远程主机之间执行 RDMA 操作?

Nvidia 网站上的在线文档仅讨论了在 GPU 之间执行 RDMA,并没有提及在 GPU 主机之间执行此操作的可能性。

注意:我可以访问配备 K80 GPU 和 Mellanox NIC 的集群。

最佳答案

Is it possible to perform an RDMA operation between a GPU and a remote host?

是的,自 2012 年以来,可以使用 Nvidia 计算 GPU(Tesla 和 Quadro)的“GPUDirect RDMA”功能(Kepler 级 GPU 和 CUDA 5.0)在 GPU 和 Infiniband 卡之间移动数据。 CUDA Toolkit中有关于GPUDirect RDMA的网页http://docs.nvidia.com/cuda/gpudirect-rdma/

GPUDirect RDMA is a technology introduced in Kepler-class GPUs and CUDA 5.0 that enables a direct path for data exchange between the GPU and a third-party peer device using standard features of PCI Express. Examples of third-party devices are: network interfaces, video acquisition devices, storage adapters.

GPUDirect RDMA is available on both Tesla and Quadro GPUs.

A number of limitations can apply, the most important being that the two devices must share the same upstream PCI Express root complex. Some of the limitations depend on the platform used and could be lifted in current/future products.

A few straightforward changes must be made to device drivers to enable this functionality with a wide range of hardware devices. This document introduces the technology and describes the steps necessary to enable an GPUDirect RDMA connection to NVIDIA GPUs on Linux.

有一些限制:http://docs.nvidia.com/cuda/gpudirect-rdma/index.html#supported-systems

2.4. Supported Systems

General remarks. Even though the only theoretical requirement for GPUDirect RDMA to work between a third-party device and an NVIDIA GPU is that they share the same root complex, there exist bugs (mostly in chipsets) causing it to perform badly, or not work at all in certain setups.

We can distinguish between three situations, depending on what is on the path between the GPU and the third-party device: PCIe switches only single CPU/IOH CPU/IOH <-> QPI/HT <-> CPU/IOH The first situation, where there are only PCIe switches on the path, is optimal and yields the best performance. The second one, where a single CPU/IOH is involved, works, but yields worse performance ( especially peer-to-peer read bandwidth has been shown to be severely limited on some processor architectures ). Finally, the third situation, where the path traverses a QPI/HT link, may be extremely performance-limited or even not work reliably. Tip: lspci can be used to check the PCI topology:

$ lspci -t 

Platform support For IBM Power 8 platform, GPUDirect RDMA and P2P are not supported, but are not explicitly disabled. They may not work at run-time.

On ARM64, the necessary peer-to-peer functionality depends on both the hardware and the software of the particular platform. So while GPUDirect RDMA is not explicitly disabled in this case, there are no guarantees that it will be fully functional.

IOMMUs GPUDirect RDMA currently relies upon all physical addresses being the same from the different PCI devices' point of view. This makes it incompatible with IOMMUs performing any form of translation other than 1:1, hence they must be disabled or configured for pass-through translation for GPUDirect RDMA to work.

关于cuda - GPU 和远程主机之间的 RDMA,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44190665/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com