gpt4 book ai didi

caching - CUDA将数据从全局内存中缓存到统一缓存中,存储到共享内存中?

转载 作者:行者123 更新时间:2023-12-05 03:11:35 25 4
gpt4 key购买 nike

据我所知,GPU 按照步骤(全局内存-l2-l1-寄存器-共享内存)将数据存储到以前的 NVIDIA GPU 架构的共享内存中。

但是,maxwell gpu(GTX980)已经在物理上分离了统一缓存和共享内存,我想知道这种架构是否也遵循相同的步骤将数据存储到共享内存中?还是它们支持全局内存和共享内存之间的直接通信?

  • 使用选项“-dlcm=ca”启用统一缓存

最佳答案

这可能会回答您关于 Maxwell 架构中的内存类型和步骤的大部分问题:

As with Kepler, global loads in Maxwell are cached in L2 only, unless using the LDG read-only data cache mechanism introduced in Kepler.

In a manner similar to Kepler GK110B, GM204 retains this behavior by default but also allows applications to opt-in to caching of global loads in its unified L1/Texture cache. The opt-in mechanism is the same as with GK110B: pass the -Xptxas -dlcm=ca flag to nvcc at compile time.

Local loads also are cached in L2 only, which could increase the cost of register spilling if L1 local load hit rates were high with Kepler. The balance of occupancy versus spilling should therefore be reevaluated to ensure best performance. Especially given the improvements to arithmetic latencies, code built for Maxwell may benefit from somewhat lower occupancy (due to increased registers per thread) in exchange for lower spilling.

The unified L1/texture cache acts as a coalescing buffer for memory accesses, gathering up the data requested by the threads of a warp prior to delivery of that data to the warp. This function previously was served by the separate L1 cache in Fermi and Kepler.

来自 Maxwell tuning guide 中的“1.4.2. 内存吞吐量”部分的“1.4.2.1. 统一 L1/纹理缓存”子部分来自英伟达。

这两个部分之后的其他部分和子部分也教授和/或明确有用的其他有关共享内存大小/带宽、缓存等的详细信息。试一试!

关于caching - CUDA将数据从全局内存中缓存到统一缓存中,存储到共享内存中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36735233/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com