gpt4 book ai didi

cuda - CUDA的虚拟和实际架构之间的差异

转载 作者:行者123 更新时间:2023-12-03 17:54:43 30 4
gpt4 key购买 nike

试图了解cuda的虚拟架构与实际架构之间的差异,以及不同的配置将如何影响程序的性能,例如

-gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_21,code=sm_21
...

NVCC手册中提供了以下说明,

GPU compilation is performed via an intermediate representation, PTX ([...]), which can be considered as assembly for a virtual GPU architecture. Contrary to an actual graphics processor, such a virtual GPU is defined entirely by the set of capabilities, or features, that it provides to the application. In particular, a virtual GPU architecture provides a (largely) generic instruction set, and binary instruction encoding is a non-issue because PTX programs are always represented in text format. Hence, a nvcc compilation command always uses two architectures: a compute architecture to specify the virtual intermediate architecture, plus a real GPU architecture to specify the intended processor to execute on. For such an nvcc command to be valid, the real architecture must be an implementation (someway or another) of the virtual architecture. This is further explained below. The chosen virtual architecture is more of a statement on the GPU capabilities that the application requires: using a smallest virtual architecture still allows a widest range of actual architectures for the second nvcc stage. Conversely, specifying a virtual architecture that provides features unused by the application unnecessarily restricts the set of possible GPUs that can be specified in the second nvcc stage.



但是仍然不太了解如何通过不同的配置来影响性能(或者可能只影响物理GPU设备的选择?)。特别是,这句话让我最困惑:

In particular, a virtual GPU architecture provides a (largely) generic instruction set, and binary instruction encoding is a non-issue because PTX programs are always represented in text format.

最佳答案

NVIDIA CUDA Compiler Driver NVCC上的GPU Compilation用户指南部分提供了有关虚拟和物理体系结构以及在构建过程中如何使用这些概念的非常详尽的描述。

虚拟体系结构指定代码所针对的功能集。下表列出了虚拟体系结构的一些发展。编译时,应指定具有足够功能集的最低虚拟体系结构,以使程序可以在最广泛的物理体系结构上执行。

虚拟体系结构功能列表(来自《用户指南》)

compute_10   Basic features
compute_11 + atomic memory operations on global memory
compute_12 + atomic memory operations on shared memory
+ vote instructions
compute_13 + double precision floating point support
compute_20 + Fermi support
compute_30 + Kepler support

物理架构指定了GPU的实现。这为编译器提供了指令集,指令等待时间,指令吞吐量,资源大小等,以便编译器可以最佳地将虚拟体系结构转换为二进制代码。

可以为编译器指定多个虚拟和物理体系结构对,并使编译器将最终的PTX和二进制文件退回到单个二进制文件中。在运行时,CUDA驱动程序将为已安装的物理设备选择最佳的表示形式。如果胖二进制文件中未提供二进制代码,则驱动程序可以运行JIT最佳PTX实现。

关于cuda - CUDA的虚拟和实际架构之间的差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14779523/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com