tensorflow - 未能分配 X 字节的统一内存；结果 : CUDA_ERROR_OUT_OF

tensorflow - 未能分配 X 字节的统一内存；结果 : CUDA_ERROR_OUT_OF_MEMORY: out of memory

转载作者：行者123 更新时间：2023-12-04 13:25:05

我正在尝试运行 tensorflow 项目，但在大学 HPC 集群上遇到内存问题。我必须为数百个不同长度的输入运行预测作业。我们有具有不同数量 vmem 的 GPU 节点，所以我试图以一种不会在 GPU 节点 - 输入长度的任何组合中崩溃的方式设置脚本。
在网上搜索解决方案后，我尝试了 TF_FORCE_UNIFIED_MEMORY、XLA_PYTHON_CLIENT_MEM_FRACTION、XLA_PYTHON_CLIENT_PREALLOCATE 和 TF_FORCE_GPU_ALLOW_GROWTH，以及 tensorflow 的 0x1046。据我了解，通过统一内存，我应该能够使用比 GPU 本身更多的内存。
这是我的最终解决方案(仅相关部分)

os.environ['TF_FORCE_UNIFIED_MEMORY']='1'
os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION']='2.0'
#os.environ['XLA_PYTHON_CLIENT_PREALLOCATE']='false'
os.environ['TF_FORCE_GPU_ALLOW_GROWTH ']='true' # as I understood, this is redundant with the set_memory_growth part :)

import tensorflow as tf    
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      print(gpu)
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

我用 set_memory_growth(slurm 作业调度程序)和 --mem=30G 在集群上提交它。
这是我的代码崩溃的错误。据我了解，它确实尝试使用统一内存，但由于某种原因失败了。

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5582 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:02:00.0, compute capability: 3.5)
2021-08-24 09:22:02.053935: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 12758286336 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:03.738635: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 11482457088 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:05.418059: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 10334211072 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:07.102411: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 9300789248 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:08.784349: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 8370710016 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:10.468644: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 7533638656 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:22:12.150588: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:764] failed to alloc 6780274688 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-08-24 09:23:10.326528: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:272] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.33GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.


Traceback (most recent call last):
  File "scripts/script.py", line 654, in <module>
    prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed), "cpu")
  File "env/lib/python3.7/site-packages/alphafold/model/model.py", line 134, in predict
    result, recycles = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
  File "env/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 183, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "env/lib/python3.7/site-packages/jax/_src/api.py", line 402, in cache_miss
    donated_invars=donated_invars, inline=inline)
  File "env/lib/python3.7/site-packages/jax/core.py", line 1561, in bind
    return call_bind(self, fun, *args, **params)
  File "env/lib/python3.7/site-packages/jax/core.py", line 1552, in call_bind
    outs = primitive.process(top_trace, fun, tracers, params)
  File "env/lib/python3.7/site-packages/jax/core.py", line 1564, in process
    return trace.process_call(self, fun, tracers, params)
  File "env/lib/python3.7/site-packages/jax/core.py", line 607, in process_call
    return primitive.impl(f, *tracers, **params)
  File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 608, in _xla_call_impl
    *unsafe_map(arg_spec, args))
  File "env/lib/python3.7/site-packages/jax/linear_util.py", line 262, in memoized_fun
    ans = call(fun, *args)
  File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 758, in _xla_callable
    compiled = compile_or_get_cached(backend, built, options)
  File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 76, in compile_or_get_cached
    return backend_compile(backend, computation, compile_options)
  File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 373, in backend_compile
    return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: Resource exhausted: Out of memory while trying to allocate 4649385984 bytes.

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "scripts/script.py", line 654, in <module>
    prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed), "cpu")
  File "env/lib/python3.7/site-packages/alphafold/model/model.py", line 134, in predict
    result, recycles = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
  File "env/lib/python3.7/site-packages/jax/interpreters/xla.py", line 373, in backend_compile
    return backend.compile(built_c, compile_options=options)
RuntimeError: Resource exhausted: Out of memory while trying to allocate 4649385984 bytes.

对于如何让它工作并使用所有可用内存的任何想法，我会很高兴。
谢谢!

最佳答案

看起来您的 GPU 不完全支持统一内存。支持是有限的，实际上 GPU 将所有数据保存在其内存中。
描述见这篇文章:https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
特别是:

On systems with pre-Pascal GPUs like the Tesla K80, calling cudaMallocManaged() allocates size bytes of managed memory on the GPU device that is active when the call is made. Internally, the driver also sets up page table entries for all pages covered by the allocation, so that the system knows that the pages are resident on that GPU.

和:

Since these older GPUs can’t page fault, all data must be resident on the GPU just in case the kernel accesses it (even if it won’t).

根据 TechPowerUp: https://www.techpowerup.com/gpu-specs/geforce-gtx-titan-black.c2549，您的 GPU 是基于 Kepler 的
据我所知，TensorFlow 也应该对此发出警告。就像是:
计算能力低于 6.0(Pascal 级 GPU)的 GPU 上的统一内存不支持超额订阅。

关于tensorflow - 未能分配 X 字节的统一内存；结果 : CUDA_ERROR_OUT_OF_MEMORY: out of memory，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68902851/

文章推荐： html - 如何在选择的下拉部分更改光标？

文章推荐： naming - 您如何命名项目？

文章推荐： silverlight-2.0 - Silverlight 网格拆分器意外行为

文章推荐： javascript - 根据键中的子字符串将对象转换为对象数组

logic - 前向链接一阶逻辑(统一)
我正在为期末考试学习，但我无法理解这个 FC 算法: 我理解你标准化每条规则的部分。然后我认为下一行是说对于满足广义 Modus Ponens (p'_iTheta = p_iTheta) 的每个 t
unity3d - 统一，如何使摩擦与运动一起工作
我有一个 3d 世界，它有一个 simpel 平台和一个代表玩家的立方体。当我旋转平台时，立方体会滑动并按照您预期的方式执行，增加和减少物理 Material 中的摩擦力。我希望立方体在输入例如 f
unity3d - 加载我的场景时发生未知错误。统一
所以我的 Unity 项目有一个大问题。我昨天工作，我没有做备份今天，在我打开项目后，我的笔记本电脑因电池电量不足而关机。之后，当我进入项目时，我得到了这个:加载“Assets/MyScene.uni
haskell - 列表推导中的“统一”
好的，我正在尝试创建一个函数来确定元组列表是否是可传递的，即如果 (x,y) 和 (y,z) 在列表中，那么 (x,z) 也在列表中。例如，[(1,2), (2,3), (1,3)]是传递的。现在
c# - 移动窗口切换时如何保持时间引用？ (统一)
这个问题在这里已经有了答案: How to pass data between scenes in Unity (5 个回答) 9 个月前关闭。我有一个游戏，我有一个队列匹配系统。我想向玩家展示他
java - JDK目录未设置或无效(统一)？
我现在正在为我的游戏创建一个 keystore (统一)但是当我按下添加键按钮时，会弹出一个错误 Java Development Kit (JDK) directory is not set or
android - Cardboard的YouTube视频流(统一)
我想将YouTube流视频放入Cardboard(适用于Android和iOS)应用中。我知道这些插件可以执行类似的操作，例如“Easy Movie Texture”，但它们不支持YouTube流媒体
unity3d - 统一。关节角度限制是什么意思？
我需要限制 ConfigurableJoint 的目标旋转以避免关节变形或破坏。为了了解角度限制的工作原理，我做了一个实验。在场景中放置一个人形模型。为骨骼添加ConfigurableJoint
regex - 人工智能匹配，统一
尝试实现一种有限形式的匹配统一。尝试匹配两个公式匹配如果我们能找到替代出现在公式中的变量使得两者在句法上是等价。我需要写一个函数来判断一个对应于基本项的常数，例如 Brother(George)
c# - 用电子邮件发送日志文件？统一
我正在使用 Unity 和 C#我想在运行时将输出日志文件发送到我的电子邮件，我使用了来自 this question 的 ByteSheep 答案和来自 this question 的 Arkane
c# - 统一，加电碰撞问题
关闭。这个问题需要debugging details .它目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and th
c# - 如何在单击游戏对象时创建菜单？ (统一)
我希望能够将鼠标悬停在游戏对象(代理)上并在右键或左键单击时创建一个类似于 Windows 右键单击菜单的 float 菜单。我试过结合使用 OnGUI() 和 OnMouseOver() 但我要
c# - 统一 - 在特定时间后改变场景
我正在为 oculus Gear VR 开发游戏(考虑内存管理)，我需要在特定时间(以秒为单位)后加载另一个屏幕 void Start () { StartCoroutine (loadSce
javascript - 统一/如何设置敌人生成的限制？
我设法生成了敌人，但它们一直在生成。如何设置限制，避免不断生成？我已经尝试添加 spawnLimit 和 spawnCounter 但无法让它工作。 var playerHealth = 100;
c# - 统一(对象名称与游戏对象)
我正在参加使用 Unity 进行游戏开发的在线类(class)，讲师有时会含糊不清。我的印象是使用游戏对象与使用游戏对象名称(在本例中为 MusicPlayer)相同，但是当我尝试将 MusicPla
c# - 统一;随机物体运动
关闭。这个问题需要更多focused .它目前不接受答案。想改进这个问题吗？更新问题，使其只关注一个问题 editing this post . 关闭 6 年前。 Improve this qu
java - 统一 - 无限的结果
为了好玩，我正在(用 Java)开发一个使用统一算法的应用程序。我选择了我的统一算法返回所有可能的统一。例如，如果我尝试解决添加(X，Y)=成功(成功(0)) 返回 {X = succ(succ(
c# - 统一。一定时间后的函数调用
如何让对象在一段时间后不可见(或只是删除)？使用 NGUI。我的示例(更改): public class scriptFlashingPressStart : MonoBehaviour {
c# - 找不到类型或命名空间名称 'NUnit' |统一
我有下一个错误: The type or namespace name 'NUnit' could not be found (are you missing a using directive or
android - 如何以编程方式将 autoSizeTextType 统一？
这是可以做到的但是属性 autoSizeTextType 只能用于 API LEVEL >= 26，并且 Android Studio 会显示有关该问题的烦人警告。为了摆脱这个问题，我想以编程方

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

tensorflow - 未能分配 X 字节的统一内存；结果 : CUDA_ERROR_OUT_OF_MEMORY: out of memory