gpt4 book ai didi

Tensorflow 首次在具有 5.0 计算能力的显卡上运行需要超过 1 分钟

转载 作者:行者123 更新时间:2023-12-03 09:45:52 25 4
gpt4 key购买 nike

我正在为 python3(pip 安装)运行 tensorflow 0.8.0,以及以下文件 test.py :

import tensorflow as tf                                                         

a = tf.convert_to_tensor([1], dtype=tf.int32)
b = tf.to_float(a)

with tf.Session():
b.eval()

...运行需要一分钟以上:
$time python3 test.py 
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.61GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)

real 1m6.985s
user 1m6.700s
sys 0m1.480s

我应该提到其他 tensorflow 程序似乎工作正常,例如
$time python3 -m tensorflow.models.image.mnist.convolutional

不到4分钟。

编辑:
$cat /usr/local/cuda/version.txt 
CUDA Version 7.5.18

$ls /usr/local/cuda/lib64/libcudnn*
/usr/local/cuda/lib64/libcudnn.so /usr/local/cuda/lib64/libcudnn.so.4.0.7
/usr/local/cuda/lib64/libcudnn.so.4 /usr/local/cuda/lib64/libcudnn_static.a

最佳答案

我认为您的 GPU GTX 860M 是 sm_50 设备。默认的 TensorFlow 二进制文件默认支持 sm_35 和 sm_52。这意味着您的二进制文件只有 PTX,并且 Cuda 运行时必须在第一次运行该内核时将它们 JIT 到 SASS 中,这需要一分钟左右的时间。但是它们应该在以后的运行中被缓存,除非缓存被明确禁用。

关于Tensorflow 首次在具有 5.0 计算能力的显卡上运行需要超过 1 分钟,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36842169/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com