gpt4 book ai didi

gpgpu - 将 Tensorflow 与 GPU 一起使用时出错

转载 作者:行者123 更新时间:2023-12-03 23:24:25 25 4
gpt4 key购买 nike

我尝试了一堆不同的 Tensorflow 示例,它们在 CPU 上运行良好,但当我尝试在 GPU 上运行它们时会产生相同的错误。一个小例子是这样的:

import tensorflow as tf

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)

错误总是一样的,CUDA_ERROR_OUT_OF_MEMORY:
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 24
I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:0a:00.0
Total memory: 11.25GiB
Free memory: 105.73MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:0b:00.0
Total memory: 11.25GiB
Free memory: 133.48MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0: Y Y
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:0b:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 105.48MiB bytes.
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 105.48M (110608384 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Check failed: gpu_mem != nullptr Could not allocate GPU device memory for device 0. Tried to allocate 105.48MiB
Aborted (core dumped)

我猜这个问题与我的配置有关,而不是这个小例子的内存使用情况。有谁有想法吗?

编辑:

我发现问题可能就像其他人在同一个 GPU 上运行作业一样简单,这可以解释少量可用内存。在那种情况下:抱歉占用您的时间...

最佳答案

这可能是因为您的 TensorFlow session 无法在 GPU 中获得足够的内存。也许您的 TensorFlow 等其他进程的可用内存量很少,或者您的系统中正在运行另一个 TensorFlow session 。所以你必须配置 TensorFlow session 将使用的内存量

如果您使用 TensorFlow 1.x

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

由于 Tensorflow 2.x 与 1.x 相比发生了重大变化。如果您想使用 TensorFlow 1.x 版本的方法/功能,TensorFlow 2.x 中保留了一个兼容性模块。所以 TensorFlow 2.x 用户可以使用这段代码
gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options))

关于gpgpu - 将 Tensorflow 与 GPU 一起使用时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34514324/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com