gpt4 book ai didi

CUDA 启动时请求的资源太多

转载 作者:塔克拉玛干 更新时间:2023-11-03 07:52:32 24 4
gpt4 key购买 nike

我在具有 Compute Capability 2.0 的 GTX 480 上运行我的代码时遇到了一些问题

如果我启动每个 block 有 1024 个线程的内核,我总是会遇到以下错误:

========= CUDA-MEMCHECK
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to "too many resources requested for launch" on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2ef613]
========= Host Frame:/usr/local/cuda-6.5/lib64/libcudart.so.6.5 (cudaLaunch + 0x17e) [0x3686e]
========= Host Frame:./bin/myProgram [0x3a50]
========= Host Frame:./bin/myProgram [0x388a]
========= Host Frame:./bin/myProgram [0x38e3]
========= Host Frame:./bin/myProgram [0x2a99]
========= Host Frame:./bin/myProgram [0x1410]
========= Host Frame:./bin/myProgram [0x1da0]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:./bin/myProgram [0x1139]
=========

我用不同的 block 和线程数多次运行程序:

5 Blocks, 512 Threads per Block => Works
5 Blocks, 1024 Threads per Block => Error
10 Blocks, 512 Threads per Block => Works
10 Blocks, 1024 Threads per Block => Error
15 Blocks, 512 Threads per Block => Works
15 Blocks, 1024 Threads per Block => Error

我检查了使用过的寄存器,好像没问题。具有 28 个寄存器的“Function4”是使用这么多线程的内核。所有其他 kernerls 每次调用仅使用 <<<1, 32>>>。

ptxas info    : 0 bytes gmem
ptxas info : Function properties for _Z7function1Py
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Compiling entry function '_Z13function2PyS_i' for 'sm_20'
ptxas info : Function properties for _Z13function2PyS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 22 registers, 52 bytes cmem[0]
ptxas info : Compiling entry function '_Z6function3PyiS_' for 'sm_20'
ptxas info : Function properties for _Z6function3PyiS_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 22 registers, 56 bytes cmem[0]
ptxas info : Compiling entry function '_Z17function4PyiiS_Phji' for 'sm_20'
ptxas info : Function properties for _Z17function4PyiiS_Phji
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 72 bytes cmem[0]

我也在我的 GTX 660 和 CC 3.0 上运行这个程序,它在每 block 1024 个线程下工作。我不知道问题出在哪里。有人有想法吗?

最佳答案

我有同样的错误。

感谢http://cuda-programming.blogspot.fr/2013/01/handling-cuda-error-messages.html ,我理解错误。他们说:

“为启动请求的资源太多 - 此错误意味着多处理器上可用的寄存器数量已超出。减少每个 block 的线程数以解决问题。”

基本上我曾经能够在每个 block 中拥有给定数量的线程(3D 内核为 8x8x16=1024)。但是,如果嵌套内核调用,则会进一步减少可用寄存器的数量。

关于CUDA 启动时请求的资源太多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26011394/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com