tensorflow - 了解 ResourceExhaustedError : OOM when allocating tensor with shape-6ren

tensorflow - 了解 ResourceExhaustedError : OOM when allocating tensor with shape

转载作者：行者123 更新时间：2023-12-03 13:27:26

我正在尝试使用 tensorflow 实现跳过思想模型，当前版本已放置 here .

目前我使用我机器的一个 GPU(总共 2 个 GPU)并且 GPU 信息是

2017-09-06 11:29:32.657299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:02:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB

但是，当我尝试向模型提供数据时出现了 OOM。我尝试调试如下:

我在运行 sess.run(tf.global_variables_initializer()) 后立即使用以下代码段

    logger.info('Total: {} params'.format(
        np.sum([
            np.prod(v.get_shape().as_list())
            for v in tf.trainable_variables()
        ])))

并得到 2017-09-06 11:29:51,333 INFO main main.py:127 - Total: 62968629 params ，大概是 240Mb如果都使用 tf.float32 . tf.global_variables的输出是

[<tf.Variable 'embedding/embedding_matrix:0' shape=(155229, 200) dtype=float32_ref>,
 <tf.Variable 'encoder/rnn/gru_cell/gates/kernel:0' shape=(400, 400) dtype=float32_ref>,
 <tf.Variable 'encoder/rnn/gru_cell/gates/bias:0' shape=(400,) dtype=float32_ref>,
 <tf.Variable 'encoder/rnn/gru_cell/candidate/kernel:0' shape=(400, 200) dtype=float32_ref>,
 <tf.Variable 'encoder/rnn/gru_cell/candidate/bias:0' shape=(200,) dtype=float32_ref>,
 <tf.Variable 'decoder/weights:0' shape=(200, 155229) dtype=float32_ref>,
 <tf.Variable 'decoder/biases:0' shape=(155229,) dtype=float32_ref>,
 <tf.Variable 'decoder/previous_decoder/rnn/gru_cell/gates/kernel:0' shape=(400, 400) dtype=float32_ref>,
 <tf.Variable 'decoder/previous_decoder/rnn/gru_cell/gates/bias:0' shape=(400,) dtype=float32_ref>,
 <tf.Variable 'decoder/previous_decoder/rnn/gru_cell/candidate/kernel:0' shape=(400, 200) dtype=float32_ref>,
 <tf.Variable 'decoder/previous_decoder/rnn/gru_cell/candidate/bias:0' shape=(200,) dtype=float32_ref>,
 <tf.Variable 'decoder/next_decoder/rnn/gru_cell/gates/kernel:0' shape=(400, 400) dtype=float32_ref>,
 <tf.Variable 'decoder/next_decoder/rnn/gru_cell/gates/bias:0' shape=(400,) dtype=float32_ref>,
 <tf.Variable 'decoder/next_decoder/rnn/gru_cell/candidate/kernel:0' shape=(400, 200) dtype=float32_ref>,
 <tf.Variable 'decoder/next_decoder/rnn/gru_cell/candidate/bias:0' shape=(200,) dtype=float32_ref>,
 <tf.Variable 'global_step:0' shape=() dtype=int32_ref>]

在我的训练短语中，我有一个数据数组，其形状为 (164652, 3, 30) ，即 sample_size x 3 x time_step , 3这里是指上一句、当前句和下一句。这个训练数据的大小约为 57Mb并存储在 loader 中.然后我用写一个生成器函数来获取句子，看起来像

def iter_batches(self, batch_size=128, time_major=True, shuffle=True):

    num_samples = len(self._sentences)
    if shuffle:
        samples = self._sentences[np.random.permutation(num_samples)]
    else:
        samples = self._sentences

    batch_start = 0
    while batch_start < num_samples:
        batch = samples[batch_start:batch_start + batch_size]

        lens = (batch != self._vocab[self._vocab.pad_token]).sum(axis=2)
        y, x, z = batch[:, 0, :], batch[:, 1, :], batch[:, 2, :]
        if time_major:
            yield (y.T, lens[:, 0]), (x.T, lens[:, 1]), (z.T, lens[:, 2])
        else:
            yield (y, lens[:, 0]), (x, lens[:, 1]), (z, lens[:, 2])
        batch_start += batch_size

训练循环看起来像

for epoch in num_epochs:
    batches = loader.iter_batches(batch_size=args.batch_size)
    try:
        (y, y_lens), (x, x_lens), (z, z_lens) =  next(batches)
        _, summaries, loss_val = sess.run(
        [train_op, train_summary_op, st.loss],
        feed_dict={
            st.inputs: x,
            st.sequence_length: x_lens,
            st.previous_targets: y,
            st.previous_target_lengths: y_lens,
            st.next_targets: z,
            st.next_target_lengths: z_lens
        })
    except StopIteraton:
        ...

然后我得到了一个OOM。如果我注释掉整个 try正文(不提供数据)，脚本运行得很好。

我不知道为什么我在这么小的数据规模上得到了 OOM。使用 nvidia-smi我总是得到

Wed Sep  6 12:03:37 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.59                 Driver Version: 384.59                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   44C    P2    60W / 275W |  10623MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   43C    P2    62W / 275W |  10621MiB / 11171MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     32748    C   python3                                      10613MiB |
|    1     32748    C   python3                                      10611MiB |
+-----------------------------------------------------------------------------+

我看不到脚本的实际 GPU 使用情况，因为 tensorflow 总是在开始时窃取所有内存。这里的实际问题是我不知道如何调试它。

我在 StackOverflow 上阅读了一些关于 OOM 的帖子。大多数情况发生在将大量测试集数据馈送到模型时，通过小批量馈送数据可以避免该问题。但我不明白为什么在我的 11Gb 1080Ti 中看到如此小的数据和参数组合很糟糕，因为它只是尝试分配大小为 [3840 x 155229] 的矩阵的错误。 . (解码器的输出矩阵， 3840 = 30(time_steps) x 128(batch_size) ， 155229 是 vocab_size)。

2017-09-06 12:14:45.787566: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ********************************************************************************************xxxxxxxx
2017-09-06 12:14:45.787597: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[3840,155229]
2017-09-06 12:14:45.788735: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[3840,155229]
     [[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](decoder/previous_decoder/MatMul, decoder/biases/read)]]
2017-09-06 12:14:45.790453: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2857 get requests, put_count=2078 evicted_count=1000 eviction_rate=0.481232 and unsatisfied allocation rate=0.657683
2017-09-06 12:14:45.790482: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3840,155229]
     [[Node: decoder/previous_decoder/Add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](decoder/previous_decoder/MatMul, decoder/biases/read)]]
     [[Node: GradientDescent/update/_146 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2166_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

任何帮助将不胜感激。提前致谢。

最佳答案

让我们一一分解问题:

关于 tensorflow 提前分配所有内存，您可以使用以下代码片段让 tensorflow 在需要时分配内存。这样您就可以了解事情的进展情况。

gpu_options = tf.GPUOptions(allow_growth=True)
session = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))

这同样适用于 tf.Session()而不是 tf.InteractiveSession()若你宁可。

关于尺寸的第二件事，
由于没有关于您的网络规模的信息，我们无法估计出了什么问题。但是，您也可以逐步调试所有网络。例如，创建一个只有一层的网络，获取其输出，一次创建 session 和馈送值，并可视化您消耗了多少内存。迭代此调试 session ，直到您看到内存不足的点。

请注意，3840 x 155229 输出确实是一个很大的输出。这意味着约 600M 神经元，每层仅约 2.22GB。如果你有任何类似大小的层，它们都会加起来非常快地填满你的 GPU 内存。

此外，这仅适用于前向方向，如果您使用此层进行训练，优化器添加的反向传播和层数将乘以 2。因此，对于训练，您仅为输出层消耗了约 5 GB。

我建议您修改您的网络并尝试减少批量大小/参数数量以使您的模型适合 GPU

关于tensorflow - 了解 ResourceExhaustedError : OOM when allocating tensor with shape，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46066850/

文章推荐： apache-kafka - 如果您的消费者少于分区，会发生什么？

文章推荐： azure-ad-b2c - Azure AD B2C 自助服务密码重置链接不起作用

文章推荐： visual-studio - Visual Studio Alt 和向上/向下箭头不起作用

文章推荐： .net - 将 bindingRedirect 添加到 .Net 标准库

MySQL OOM 系统二 OOM Killer
这里就涉及到一个问题，到底Kill掉谁呢？一般稍微了解一些Linux内核的同学第一反应是谁用的最多，就Kill掉谁。这当然是Linux内核首先考虑的一种重要因素，但是也不完全是这样的，我们查一些Li
Java/OOM : How to dump all information on java heap space when it crashes due to OOM?
这个问题在这里已经有了答案: Set a JVM to dump heap when OutOfMemoryError is thrown (2 个答案) 关闭 5 年前。我是JAVA新手。我在用
多进程fitnesse测试环境中的Java OOM
我们正在使用 Fitnesse 对复杂的基于 Web 的应用程序进行验收测试。全套流程需要几个小时才能通过，因此我们使用多个流程。设置如下: maven fork Fitnesse 服务器进程 mav
tensorflow - GPU上的Tensorflow OOM
我正在Tensorflow的LSTM-RNN上训练一些音乐数据，并且遇到了我不明白的一些GPU内存分配问题:当实际上似乎还有足够的VRAM可用时，我遇到了OOM。一些背景: 我正在使用6GB的GTX
卡住图后的 Tensorflow OOM
我正在使用 tf 运行 seq2seq 模型，当使用 tf.train.Saver 从检查点文件加载参数时，推理程序运行良好。但是在使用 freeze_graph.py(使用 tf.framework
Javascript OOM，继承问题
我有一个问题需要用 JS 中的某种继承来解决。我设置了一个小的 jsfiddle 来解释，看: V1 http://jsfiddle.net/FFTj4/5/ function Vehicule(n
JavaScript OOM 错误
这里是 JS 的新手，所以如果我遗漏了一些明显的东西，我深表歉意。尝试构建一个随机数生成器(它以嵌套方式工作，所以有点像随机数元组列表)，但我收到此代码的 OOM 错误。 (比如，如果我尝试做类似 g
android - 从可绘制文件夹加载图像时通用图像加载器 OOM？
我有一个需要显示全屏图像的应用程序，我从可绘制文件夹中获取图像，它们大约为 150-250 kb，但它仍然崩溃并出现 OutOfMemory 错误。当然不是第一张图片，但每次用户启动应用程序时我都会加
postgresql - 在简单的读写上激发 OOM
我正在使用 spark 从 postgres 表中读取并将其作为 json 转储到 Google 云存储。该表很大，有数百个 GB。该代码相对简单(请参见下文)但因 OOM 而失败。似乎 spark
Tomcat 启动失败并出现 OOM
即使系统中有足够的内存并且正确提供了所有必需的内存设置，Tomcat 仍无法启动并出现 OOM。这种情况并没有持续发生，证明 tomact 配置没有问题。 15-Jan-2019 20:17:31.0
java - 高负载项目中的Java OOM
我在高负载多线程Java项目中遇到OOM异常问题。我很感激你能给我任何帮助。德莱尔斯：项目是建立在Java+Mysql作为存储。没有证据表明在应用程序崩溃时会使用额外的RAM（任何监控工具都不
linux - 即使有足够的可用内存也会调用 oom
我使用 Android P-OS。内核版本为msm-4.14 自启动以来，oom 被调用并终止进程。不过内存还是很丰富的。我的内存大小是8GByte，Swap是1GByte。我什至没有使用交换。 [
java - Tomcat多次重新部署Web应用后如何解决Metaspace OOM？
所有的一切，我正在使用 openjdk 1.8.0_212-b04、Tomcat 8.0.21 和 Red Hat 6.4。并且我已经调整了测试web应用程序，确保重新部署后不会有没有这样的消息:
带位图的 Android OOM
所以我在 Crashlytics 中看到我们有很多崩溃是由位图的 OOM 引起的。似乎其中 60% 来自 6.0.1 上的 Galaxy S7 Edge 设备。我们拥有的是一个包含 2 个图像的着陆屏
容器中的 Ruby OOM
最近我们在 Docker 容器中遇到了 Ruby 的问题。尽管负载非常低，但应用程序往往会消耗大量内存，并且在提到的一段时间后会出现 OOM。经过一番调查，我们将问题缩小到单线 docker run
Snakemake 显式处理内存不足 (OOM) 故障
Snakemake 工作流可以在任何类型的失败后重新尝试每次重启，包括如果错误是内存不足(OOM)，例如 def get_mem_mb(wildcards, attempt): return
Istio envoy 代理请求循环导致 OOM
我有一个有趣的问题。我想我发现了一个无限请求循环，它导致我的 istio-proxy 在特定情况下因 OOM 错误而崩溃。当我直接从应用程序容器内部将请求本地提交到应用程序时，它似乎工作正常，并且在
messaging - ActiveMQ 创建大量主题时抛出 OOME
我使用的是 ActiveMQ 5.2，我的应用程序需要大量主题，大约 500,000 个。当我运行我的应用程序时，仅创建大约 1000 个主题后，ActiveMQ 会抛出 OutOfMemoryExc
apache-spark - 结构化流 OOM
我在 k8s 运算符上部署了一个结构化流作业，它只是从 kafka 读取数据，反序列化，添加 2 列并将结果存储在数据湖中(尝试了 delta 和 parquet)，几天后执行程序增加了内存，最终我得
docker - 上载文件时Minio OOM(内存不足)
我的Mac上的Minikube中有一个本地Kubernetes集群。我将Minio独立服务器部署为具有指定资源限制的单个容器。当我上载大于容器内存限制的文件时，容器因OOMKilled原因终止。在Ub

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

tensorflow - 了解 ResourceExhaustedError : OOM when allocating tensor with shape