gpt4 book ai didi

cuda - 使用 cudaHostAlloc 分配的固定内存在哪里?

转载 作者:行者123 更新时间:2023-12-05 01:15:21 24 4
gpt4 key购买 nike

我在看 Page-Locked Host MemoryCuda Programming Guide并想知道在使用函数 cudaHostAlloc 创建时分配的固定内存在哪里?它在内核地址空间中吗?还是分配在进程地址空间中?

最佳答案

CUDA(以及其他具有 DMA 功能的外部硬件,如 PCI-express 卡)的“页面锁定主机内存”分配在主机的物理内存中。分配被标记为不可交换(不可分页)和不可移动(锁定、固定)。这类似于 mlock syscall的 Action “将调用进程的部分或全部虚拟地址空间锁定到 RAM 中,防止该内存被分页到交换区。”

这个分配可以被内核虚拟地址空间访问(因为内核拥有物理内存的完整 View )并且这个分配也被添加到用户进程虚拟地址空间以允许进程访问它。

当您执行普通 malloc 时,实际的物理内存分配可能(并将)推迟到对页面的第一次(写)访问。使用锁定/锁定内存,所有物理页面都在锁定或锁定调用中分配(如 mmap 中的 MAP_POPULATE :“为映射填充(预错)页表”),并且页面的物理地址不会改变(不交换,不移动,没有压缩...)。

CUDA 文档:
http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1gb65da58f444e7230d3322b6126bb4902

__host__ ​cudaError_t cudaHostAlloc ( void** pHost, size_t size, unsigned int flags )

Allocates page-locked memory on the host. ...

Allocates size bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.

...

Memory allocated by this function must be freed with cudaFreeHost().



固定和未固定内存比较: https://www.cs.virginia.edu/~mwb7w/cuda_support/pinned_tradeoff.html “在固定和非固定内存之间选择”

Pinned memory is memory allocated using the cudaMallocHost function, which prevents the memory from being swapped out and provides improved transfer speeds. Non-pinned memory is memory allocated using the malloc function. As described in Memory Management Overhead and Memory Transfer Overhead, pinned memory is much more expensive to allocate and deallocate but provides higher transfer throughput for large memory transfers.



CUDA 论坛发布了来自 txbob 版主的建议: https://devtalk.nvidia.com/default/topic/899020/does-cudamemcpyasync-require-pinned-memory-/ “cudaMemcpyAsync 是否需要固定内存?”

If you want truly asynchronous behavior (e.g. overlap of copy and compute) then the memory must be pinned. If it is not pinned, there won't be any runtime errors, but the copy will not be asynchronous - it will be performed like an ordinary cudaMemcpy.

The usable size may vary by system and OS. Pinning 4GB of memory on a 64GB system on Linux should not have a significant effect on CPU performance, after the pinning operation is complete. Attempting to pin 60GB on the other hand might cause significant system responsiveness issues.

关于cuda - 使用 cudaHostAlloc 分配的固定内存在哪里?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49480334/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com