gpt4 book ai didi

machine-learning - 如何解释 tensorflow 中的池分配器消息?

转载 作者:行者123 更新时间:2023-11-30 08:19:55 28 4
gpt4 key购买 nike

在训练tensorflow seq2seq模型时,我看到以下消息:

W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 27282 get requests, put_count=9311 evicted_count=1000 eviction_rate=0.1074 and unsatisfied allocation rate=0.699032I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 100 to 110W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 13715 get requests, put_count=14458 evicted_count=10000 eviction_rate=0.691659 and unsatisfied allocation rate=0.675684I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 110 to 121W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 6965 get requests, put_count=6813 evicted_count=5000 eviction_rate=0.733891 and unsatisfied allocation rate=0.741421I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 133 to 146W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 44 get requests, put_count=9058 evicted_count=9000 eviction_rate=0.993597 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 46 get requests, put_count=9062 evicted_count=9000 eviction_rate=0.993158 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 4 get requests, put_count=1029 evicted_count=1000 eviction_rate=0.971817 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 2 get requests, put_count=1030 evicted_count=1000 eviction_rate=0.970874 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 44 get requests, put_count=6074 evicted_count=6000 eviction_rate=0.987817 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 12 get requests, put_count=6045 evicted_count=6000 eviction_rate=0.992556 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 2 get requests, put_count=1042 evicted_count=1000 eviction_rate=0.959693 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 44 get requests, put_count=6093 evicted_count=6000 eviction_rate=0.984737 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 4 get requests, put_count=1069 evicted_count=1000 eviction_rate=0.935454 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 17722 get requests, put_count=9036 evicted_count=1000 eviction_rate=0.110668 and unsatisfied allocation rate=0.550615I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 792 to 871W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 6 get requests, put_count=1093 evicted_count=1000 eviction_rate=0.914913 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 6 get requests, put_count=1101 evicted_count=1000 eviction_rate=0.908265 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 3224 get requests, put_count=4684 evicted_count=2000 eviction_rate=0.426985 and unsatisfied allocation rate=0.200062I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 1158 to 1273W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 17794 get requests, put_count=17842 evicted_count=9000 eviction_rate=0.504428 and unsatisfied allocation rate=0.510228I tensorflow/core/common_runtime/gpu/pool_allocator.cc:239] Raising pool_size_limit_ from 1400 to 1540W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 31 get requests, put_count=1185 evicted_count=1000 eviction_rate=0.843882 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 40 get requests, put_count=8209 evicted_count=8000 eviction_rate=0.97454 and unsatisfied allocation rate=0W tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 0 get requests, put_count=2272 evicted_count=2000 eviction_rate=0.880282 and unsatisfied allocation rate=-nanW tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 0 get requests, put_count=2362 evicted_count=2000 eviction_rate=0.84674 and unsatisfied allocation rate=-nanW tensorflow/core/common_runtime/gpu/pool_allocator.cc:227] PoolAllocator: After 38 get requests, put_count=5436 evicted_count=5000 eviction_rate=0.919794 and unsatisfied allocation rate=0

这是什么意思?这是否意味着我遇到了一些资源分配问题?我在 Titan X 3500+ CUDA、12 GB GPU 上运行

最佳答案

TensorFlow 有多个内存分配器,用于以不同方式使用内存。他们的行为具有一些适应性。

在您的特定情况下,由于您使用的是 GPU,因此有一个用于 CPU 内存的 PoolAllocator,它已预先注册到 GPU 以实现快速 DMA。例如,预计从 CPU 传输到 GPU 的张量将从该池中分配。

PoolAllocators 尝试通过保留一个已分配然后释放的可立即重用的 block 池来分摊调用更昂贵的底层分配器的成本。他们的默认行为是缓慢增长,直到驱逐率降至某个常数以下。 (逐出率是免费调用的比例,其中我们将未使用的 block 从池返回到底层池,以便不超过大小限制。)在上面的日志消息中,您会看到显示池的“Raising pool_size_limit_”行规模不断增长。假设您的程序实际上具有稳定状态行为,并且具有所需的最大大小的 block 集合,则池将增长以容纳它,然后不再增长。它以这种方式运行,而不是简单地保留所有分配过的 block ,以便很少需要或仅在程序启动期间需要的大小不太可能保留在池中。

只有当您内存不足时,这些消息才应该引起您的关注。在这种情况下,日志消息可能有助于诊断问题。另请注意,只有在内存池增长到适当的大小后才能达到峰值执行速度。

关于machine-learning - 如何解释 tensorflow 中的池分配器消息?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35151207/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com