python - 仅 Cuda 非 Windows 平台支持调用 GPU asm 编译。依赖驱动进行ptx编译-6ren

python - 仅 Cuda 非 Windows 平台支持调用 GPU asm 编译。依赖驱动进行ptx编译

转载作者：行者123 更新时间：2023-12-05 02:45:58

我正在尝试在一个简单的 MNIST 模型上使用我的 GPU 和 TensorFlow 2.3.0。我已经安装了 CUDA 10.1 和 cuDNN 7.6.5。它似乎工作(模型比以前快，2 秒一个纪元)，虽然打开任务管理器使它看起来像 GPU 根本没有被使用，表明它可能更快。我在 SO 上看到了关于这个警告的其他问题，尽管答案都指向了我没有使用的数据生成器的使用。我尝试了此处评论中提到的解决方案:Tensorflow-gpu not using GPU while fitting model虽然它没有帮助。我的 jupyter notebook 输出如下:

[I 17:06:09.421 NotebookApp] JupyterLab extension loaded from C:\Users\jsmith\Anaconda3\lib\site-packages\jupyterlab
[I 17:06:09.421 NotebookApp] JupyterLab application directory is C:\Users\jsmith\Anaconda3\share\jupyter\lab
[I 17:06:09.423 NotebookApp] Serving notebooks from local directory: C:\Users\jsmith
[I 17:06:09.424 NotebookApp] The Jupyter Notebook is running at:
[I 17:06:09.424 NotebookApp] http://localhost:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:09.424 NotebookApp]  or http://127.0.0.1:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:09.424 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:06:09.460 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///C:/Users/jsmith/AppData/Roaming/jupyter/runtime/nbserver-13024-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
     or http://127.0.0.1:8888/?token=4da96629c2f5d3e118e50a083d16b21990572a21f3bf04ad
[I 17:06:17.565 NotebookApp] Kernel started: 02385ecf-f682-496e-a056-9442356a7642
[I 17:06:30.004 NotebookApp] Starting buffering for 02385ecf-f682-496e-a056-9442356a7642:10507d95e443431392e5aa3a711b2952
[I 17:06:30.237 NotebookApp] Kernel restarted: 02385ecf-f682-496e-a056-9442356a7642
[I 17:06:30.824 NotebookApp] Restoring connection for 02385ecf-f682-496e-a056-9442356a7642:10507d95e443431392e5aa3a711b2952
[I 17:06:30.824 NotebookApp] Replaying 3 buffered messages
2021-01-11 17:06:31.365107: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.450527: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-01-11 17:06:33.480765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:33.480922: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.485959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:33.489200: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:33.490830: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:33.494440: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:33.496906: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:33.510665: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:33.510964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:33.511780: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-11 17:06:33.520884: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ac3b7d9710 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-11 17:06:33.520974: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-11 17:06:33.521624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:33.522001: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:33.522330: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:33.522841: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:33.523498: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:33.523780: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:33.525973: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:33.526177: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:33.527458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:34.029958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-11 17:06:34.030108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-11 17:06:34.031482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-11 17:06:34.032610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2905 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-11 17:06:34.037750: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1ac659c5460 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-11 17:06:34.037828: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1650 Ti, Compute Capability 7.5
2021-01-11 17:06:34.213785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 Ti computeCapability: 7.5
coreClock: 1.485GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 178.84GiB/s
2021-01-11 17:06:34.214061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-01-11 17:06:34.215834: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:34.219921: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-01-11 17:06:34.220345: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-01-11 17:06:34.220737: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-01-11 17:06:34.221081: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-01-11 17:06:34.221561: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:34.221816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-11 17:06:34.222153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-11 17:06:34.222309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-11 17:06:34.222660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-11 17:06:34.222923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2905 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-11 17:06:35.001255: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-11 17:06:35.220686: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-11 17:06:36.204198: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.

这是我获取数据的代码:

num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

这是我的模型:

    from tensorflow.keras import layers        
    model = keras.Sequential(
                [
                    keras.Input(shape=input_shape),
                    layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
                    layers.MaxPooling2D(pool_size=(2, 2)),
                    layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
                    layers.MaxPooling2D(pool_size=(2, 2)),
                    layers.Flatten(),
                    layers.Dropout(0.5),
                    layers.Dense(num_classes, activation="softmax"),
                ]
            )
        batch_size = 128
        epochs = 60
        
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
        
        batch_size = 128
epochs = 60

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
with tf.device('/GPU:1'):
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

这是任务管理器。您可以看到几乎所有的 GPU 内存都被使用了，但只有 4% 被使用，而 45% 的 CPU 被使用。

最佳答案

在该页面上，选择视频编码旁边的下拉菜单并将其更改为 CUDA。然后，您将看到 Tensorflow 的 GPU 事件。这对我来说也不明显，但基本上你只是在看 GPU 事件的错误部分。

关于python - 仅 Cuda 非 Windows 平台支持调用 GPU asm 编译。依赖驱动进行ptx编译，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65676011/

文章推荐： ios - 如何在 SwiftUI 中为按下时的导航链接设置动画？

文章推荐： python - 如何使用 Python 3.8.5 安装 vtkplotter？

文章推荐： r - 包 'tinytex' 的安装具有非零退出状态

文章推荐： c++ - 如何找到boost::dynamic_bitset占用的内存？

mysql - Processmaker 安装 MySQL 支持 : undefined , GD 支持 : not enabled , LDAP 支持 (*):未启用
我的 processmaker 安装遇到了一些问题。我正在尝试使用本指南 [url]http://wiki.processmaker.com/index.php/ProcessMaker_Ubuntu
ios - 验证 iMessage 支持、SMS 支持
我正在使用 ShareKit。发送 SMS 消息使用 MFMessageComposeViewController，用户看到标题“文本”。我想将该标题更改为更能反射(reflect)实际可用内容的内容
android - ActionBar(支持)和 Fragment(支持)
我需要在我的一个针对 Gingerbread 的 Android 应用程序中使用操作栏和 fragment 的组合。所以我使用了 v7 支持库中的操作栏和 v4 支持库中的 fragment ，并使用
javascript - 为什么 Web Workers 中有同步 FileSystem 支持，但没有同步 WebSocket 支持？
我明白为什么浏览器 vendor 不想帮助我阻止他们的 UI 线程。但是，我不明白为什么会有: Web Workers 中没有 sleep (2) 没有同步 WebSockets API 有一个syn
hadoop - Cloudera 支持 docker 容器或 Docker 支持 CM 5 图像
最近我的组织正在考虑使用 Docker。我们组使用的是cloudera CDH 5.1.2。 1) cloudera 是否与 Docker 容器兼容？2) docker 和cloudera 组合是否存
macos - 如何在 Mac 上完整(具有所有功能 - ACL 支持/Xattr 支持/xxhash 库/zstd 库)安装(编译)rsync v3.2.3？
我正在尝试通过编译在 Mac 上安装 rsync 3.2.3。但是，我想安装所有功能。为此，它需要一些库，此处 ( https://download.samba.org/pub/rsync/INSTA
pydev 支持 nose2
我一直在使用 PyDev 成功运行 nose 测试，并想试试 nose2。所以我安装了它 pip install nose2 复制/粘贴来自 http://nose2.info/ 的示例代码进入名为
LLVM OpenMP 支持
我想知道 LLVM 中是否有任何函数/方法可以在 LLVM IR 中添加 Open-MP 构造。 llvm-3.0 是否仍然支持 OpenMP 指令？最佳答案 OpenMP 是一种高级语言扩展。因此
支持 CUDA 的设备
我对 CUDA 编程非常陌生。我正在浏览 SDK 附带的示例。我能够编译代码，但是当我运行它时，出现以下错误: "clock.cu(177) : CUDA Runtime API error 38:
RStudio HiDPI 支持
RStudio 是用于 R 开发的出色 IDE。我想知道是否有任何方法可以很好地支持 HiDPI 分辨率？我目前有 13 英寸显示器和 3200x1800 分辨率，甚至很难阅读 RStudio 选项
Django RDF 支持？
我正在寻找一种有助于为 Django 项目提供 RDF 支持的工具。到目前为止，我发现了两个: django-rdf - 最后一次修改是在 4 年前，所以它看起来像是一个死项目。 djubby -
支持 Javascript 链接的关系运算符吗？
我刚刚尝试了一些 JS 核心原则，发现引擎评估链接的关系运算符而不会引发错误。相反，他们以我自己无法理解的方式进行评估。 console.log(1 4 > 3 > 2 > 1); //false,
支持 textmate 片段的编辑器？
我知道 etexteditor 和 vim/emacs。是否有任何其他 Windows 编辑器支持类似 textmate 的片段(例如，您编写触发词，按 Tab，它更改为某些内容，再次按 Tab，它
kubernetes - 如何验证集群网络策略配置/支持
我正在尝试找出验证给定集群的网络策略配置的最佳方法。 According to the documentation Network policies are implemented by the ne
boost MD5 支持？
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
z3 - 支持 AUFBV？
Z3 会支持 AUFBV 吗？对于以下脚本: (set-logic AUFBV) (declare-fun x () (_ BitVec 16)) (declare-const t (Array (
c# - 支持/反对使用部分类进行测试的原因？
使用分部类编写 NUnit 测试的优缺点是什么？我要开始了: 亲:可以测试私有(private)方法缺点:TDD 不再可能了还有什么？最佳答案缺点:要么您必须测试与您发布的版本不同的构建，要
Javascript AOP 支持
它很容易(对于 90% 的 aop 特性)在没有任何语言本身支持的情况下做到这一点，就像大多数动态语言如 python 和 ruby 一样。然而，Dojo在 1.3.2 上直接支持它.最新版本发生
Android:测试字符是否可显示/支持？
我在我的 android 应用程序中使用亚洲字符，我已经了解到某些字符无法显示，因为系统字体不支持它们。我查询了一个包含亚洲字符的数据库，并且经常检索到无法显示的标志。这些情况对我的应用程序来说通常不
ios - 支持@的UITextView没有空格来放置用户名
你好，我想实现一个控件，我想在用户键入@字符时启用该控件，直到未填充运行文本中的空格为止，它应该显示用户列表，@符号后键入的文本应该显示基于键盘字符的建议，就像我们在上面看到的那样Twitter 或

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 仅 Cuda 非 Windows 平台支持调用 GPU asm 编译。依赖驱动进行ptx编译