gpt4 book ai didi

tensorflow - 未能分配 X 字节的统一内存;结果 : CUDA_ERROR_OUT_OF_MEMORY: out of memory

转载 作者:行者123 更新时间:2023-12-04 13:25:05 28 4
gpt4 key购买 nike

我正在尝试运行 tensorflow 项目,但在大学 HPC 集群上遇到内存问题。我必须为数百个不同长度的输入运行预测作业。我们有具有不同数量 vmem 的 GPU 节点,所以我试图以一种不会在 GPU 节点 - 输入长度的任何组合中崩溃的方式设置脚本。
在网上搜索解决方案后,我尝试了 TF_FORCE_UNIFIED_MEMORY、XLA_PYTHON_CLIENT_MEM_FRACTION、XLA_PYTHON_CLIENT_PREALLOCATE 和 TF_FORCE_GPU_ALLOW_GROWTH,以及 tensorflow 的 0x1046。据我了解,通过统一内存,我应该能够使用比 GPU 本身更多的内存。
这是我的最终解决方案(仅相关部分)

os.environ['TF_FORCE_UNIFIED_MEMORY']='1'
os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION']='2.0'
#os.environ['XLA_PYTHON_CLIENT_PREALLOCATE']='false'
os.environ['TF_FORCE_GPU_ALLOW_GROWTH ']='true' # as I understood, this is redundant with the set_memory_growth part :)

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
print(gpu)
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
我用 set_memory_growth(slurm 作业调度程序)和 --mem=30G 在集群上提交它。
这是我的代码崩溃的错误。据我了解,它确实尝试使用统一内存,但由于某种原因失败了。
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5582 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:02:00.0, compute capability: 3.5)
2021-08-24 09:22:02.053935: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 12758286336 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:03.738635: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 11482457088 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:05.418059: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 10334211072 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:07.102411: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 9300789248 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:08.784349: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 8370710016 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:10.468644: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 7533638656 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:12.150588: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 6780274688 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:23:10.326528: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.33GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.


Traceback (most recent call last):
File "scripts/script.py", line 654, in <module>
prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed), "cpu")
File "env/lib/python3.7/site-packages/alphafold/model/model.py", line 134, in predict
result, recycles = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
File "env/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 183, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "env/lib/python3.7/site-packages/jax/_src/api.py", line 402, in cache_miss
donated_invars=donated_invars, inline=inline)
File "env/lib/python3.7/site-packages/jax/core.py", line 1561, in bind
return call_bind(self, fun, *args, **params)
File "env/lib/python3.7/site-packages/jax/core.py", line 1552, in call_bind
outs = primitive.process(top_trace, fun, tracers, params)
File "env/lib/python3.7/site-packages/jax/core.py", line 1564, in process
return trace.process_call(self, fun, tracers, params)
File "env/lib/python3.7/site-packages/jax/core.py", line 607, in process_call
return primitive.impl(f, *tracers, **params)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 608, in _xla_call_impl
*unsafe_map(arg_spec, args))
File "env/lib/python3.7/site-packages/jax/linear_util.py", line 262, in memoized_fun
ans = call(fun, *args)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 758, in _xla_callable
compiled = compile_or_get_cached(backend, built, options)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 76, in compile_or_get_cached
return backend_compile(backend, computation, compile_options)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 373, in backend_compile
return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 4649385984 bytes.

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "scripts/script.py", line 654, in <module>
prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed), "cpu")
File "env/lib/python3.7/site-packages/alphafold/model/model.py", line 134, in predict
result, recycles = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 373, in backend_compile
return backend.compile(built_c, compile_options=options)
RuntimeError: Resource exhausted: Out of memory while trying to allocate 4649385984 bytes.
对于如何让它工作并使用所有可用内存的任何想法,我会很高兴。
谢谢!

最佳答案

看起来您的 GPU 不完全支持统一内存。支持是有限的,实际上 GPU 将所有数据保存在其内存中。
描述见这篇文章:https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
特别是:

On systems with pre-Pascal GPUs like the Tesla K80, calling cudaMallocManaged() allocates size bytes of managed memory on the GPU device that is active when the call is made. Internally, the driver also sets up page table entries for all pages covered by the allocation, so that the system knows that the pages are resident on that GPU.


和:

Since these older GPUs can’t page fault, all data must be resident on the GPU just in case the kernel accesses it (even if it won’t).


根据 TechPowerUp: https://www.techpowerup.com/gpu-specs/geforce-gtx-titan-black.c2549,您的 GPU 是基于 Kepler 的
据我所知,TensorFlow 也应该对此发出警告。就像是:
计算能力低于 6.0(Pascal 级 GPU)的 GPU 上的统一内存不支持超额订阅。

关于tensorflow - 未能分配 X 字节的统一内存;结果 : CUDA_ERROR_OUT_OF_MEMORY: out of memory,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68902851/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com