gpt4 book ai didi

Tensorflow 无法注册 CUDNN 处理程序

转载 作者:行者123 更新时间:2023-12-04 04:21:35 26 4
gpt4 key购买 nike

我不确定这是否是这个问题的正确堆栈交换,但这里是。

我已经安装了最新的 CUDA 驱动程序和 Tensorflow 1.14,但是当我尝试训练卷积层时,Tensorflow 说它找不到实现,因为它无法创建 cudnn 处理程序。我不知道该怎么办。

tensorflow 错误。

2019-11-29 22:54:16.276690: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-29 22:54:16.321772: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz
2019-11-29 22:54:16.322826: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b7b1203d70 executing computations on platform Host. Devices:
2019-11-29 22:54:16.322992: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-11-29 22:54:16.327949: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-29 22:54:17.028426: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.029075: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b7b1993230 executing computations on platform CUDA. Devices:
2019-11-29 22:54:17.029093: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2019-11-29 22:54:17.030709: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.031088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2019-11-29 22:54:17.031304: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-11-29 22:54:17.032299: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-29 22:54:17.033255: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-11-29 22:54:17.033855: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-11-29 22:54:17.035048: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-11-29 22:54:17.036049: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-11-29 22:54:17.038877: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-29 22:54:17.039240: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.039666: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.040184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-11-29 22:54:17.040583: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-11-29 22:54:17.042475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-29 22:54:17.042517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-11-29 22:54:17.042681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-11-29 22:54:17.043278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.043675: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning
NUMA node zero
2019-11-29 22:54:17.044663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7466 MB memory) -> physical GPU (device: 0,
name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
Epoch 1/3
2019-11-29 22:54:18.565862: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-29 22:54:18.776295: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x55b7b1d9a810
2019-11-29 22:54:18.776389: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-11-29 22:54:19.028300: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-29 22:54:19.037596: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-29 22:54:19.678671: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-11-29 22:54:19.685432: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "mnist_example.py", line 268, in <module>
res_dict_interpolated = run_model(build_interpolated, "Interpolated", verbose)
File "mnist_example.py", line 216, in run_model
validation_data=(x_test, y_test))
File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 780, in fit
steps_name='steps_per_epoch')
File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 363, in model_iteration
batch_outs = f(ins_batch)
File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
run_metadata=self.run_metadata)
File "/home/kasperfred/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node interpolated_conv2d/Conv2D}}]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node interpolated_conv2d/Conv2D}}]]
[[loss/mul/_73]]
0 successful operations.
0 derived errors ignored.

nvidia-smi 输出

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:01:00.0 Off | N/A |
| 38% 41C P0 N/A / N/A | 0MiB / 7979MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

cat/usr/local/cuda-10.0/include/cudnn.h 的输出 | grep CUDNN_MAJOR -A 2

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

最佳答案

曾经有some problem在TF 1.14中,可以通过设置来解决

config.gpu_options.allow_growth = True

或者对于 TF 2.0:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

关于Tensorflow 无法注册 CUDNN 处理程序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59111615/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com