gpt4 book ai didi

python - nvidia-docker : failed call to cuInit: CUDA_ERROR_UNKNOWN 中的 TensorFlow

转载 作者:太空狗 更新时间:2023-10-30 01:36:54 26 4
gpt4 key购买 nike

我一直致力于让依赖于 TensorFlow 的应用程序作为带有 nvidia-docker 的 docker 容器工作。我已经在 tensorflow/tensorflow:latest-gpu-py3 图像之上编译了我的应用程序。我使用以下命令运行我的 docker 容器:

sudo nvidia-docker run -d -p 9090:9090 -v/src/weights:/weights myname/myrepo:mylabel

通过 portainer 查看日志时,我看到以下内容:

2017-05-16 03:41:47.715682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715896: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715978: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.716002: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.718076: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-05-16 03:41:47.718177: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: 1e22bdaf82f1
2017-05-16 03:41:47.718216: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 1e22bdaf82f1
2017-05-16 03:41:47.718298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.57.0
2017-05-16 03:41:47.718398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
"""
2017-05-16 03:41:47.718455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
2017-05-16 03:41:47.718484: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.57.0

容器似乎确实启动正常,我的应用程序似乎正在运行。当我向它发送预测请求时,预测会正确返回 - 但是在 CPU 上运行推理时我希望速度很慢,所以我认为很明显 GPU 出于某种原因没有被使用。我还尝试从同一容器中运行 nvidia-smi 以确保它能看到我的 GPU,这些是结果:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K1 Off | 0000:00:07.0 Off | N/A |
| N/A 28C P8 7W / 31W | 25MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

我当然不是这方面的专家 - 但看起来确实可以从容器内部看到 GPU。关于如何使用 TensorFlow 进行此操作的任何想法?

最佳答案

我在 ubuntu16.04 桌面上运行 tensorflow。

几天前,我使用 GPU 运行代码时运行良好。 但是今天我找不到带有以下代码的gpu设备


将 tensorflow 导入为 tf
从 tensorflow.python.client 将 device_lib 导入为 _device_lib
以 tf.Session() 作为 sess:
local_device_protos = _device_lib.list_local_devices()
打印(local_device_protos)
[打印(x.name) for x in local_device_protos]

当我运行 tf.Session()

时,我意识到以下问题

cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN

我在系统详细信息中检查我的 Nvidia 驱动程序,并使用 nvcc -Vnvida-smi 检查驱动程序、cuda 和 cudnn。一切似乎都很好。

然后我去Additional Drivers查看驱动的详细信息,在那里我发现有很多版本的NVIDIA驱动并且选择了最新的版本。但是当我第一次安装驱动程序时,只有一个。

所以我选择了一个旧版本,并应用了更改。 enter image description here

然后我运行 tf.Session() 问题也在这里。我想我应该重新启动我的计算机,在我重新启动它之后,这个问题就消失了。


sess = tf.Session()
2018-07-01 12:02:41.336648: I tensorflow/core/platform/cpu_feature_guard.cc:140] 您的 CPU 支持此 TensorFlow 二进制文件未编译使用的指令:AVX2 FMA
2018-07-01 12:02:41.464166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] 从 SysFS 读取成功的 NUMA 节点有负值(-1),但必须至少有一个 NUMA 节点,所以返回NUMA 节点零
2018-07-01 12:02:41.464482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] 找到具有属性的设备 0:
名称:GeForce GTX 1070 主要:6 次要:1 内存时钟频率(GHz):1.8225
pciBusID:0000:01:00.0
总内存:7.93GiB 空闲内存:7.27GiB
2018-07-01 12:02:41.464494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] 添加可见的 gpu 设备:0
2018-07-01 12:02:42.308689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-01 12:02:42.308721: 我 tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-01 12:02:42.308729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-01 12:02:42.309686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7022 MB 内存)-> 物理 GPU(设备:0,名称:GeForce GTX 1070,pci 总线 ID:0000:01:00.0,计算能力:

关于python - nvidia-docker : failed call to cuInit: CUDA_ERROR_UNKNOWN 中的 TensorFlow,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43992230/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com