- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
(这是对 SO 、 jupyterhub issue tracker 和 jupyterhub/systemdspawner issue tracker 的交叉发布)
我有一个使用 SystemdSpawner
的私有(private) JupyterHub 设置,我尝试在 gpu 支持下运行 tensorflow。
我遵循了 tensorflow instructions,或者在 AWS EC2 g4 实例上尝试了一个已经配置好的 AWS AMI(深度学习基础 AMI(Ubuntu 18.04)版本 21.0)和 NDVIDIA。
在这两种设置上,我都可以在 (i)python 3.6 shell 中使用带有 gpu 支持的 tensorflow
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
2020-02-12 10:57:13.670937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-12 10:57:13.698230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-12 10:57:13.699066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-02-12 10:57:13.699286: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-12 10:57:13.700918: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-12 10:57:13.702512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-12 10:57:13.702814: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-12 10:57:13.704561: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-12 10:57:13.705586: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-12 10:57:13.709171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-12 10:57:13.709278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-12 10:57:13.710120: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-12 10:57:13.710893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
nvidia-smi
和
deviceQuery
显示 gpu:
$ nvidia-smi
Wed Feb 12 10:39:44 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 33C P8 9W / 70W | 0MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ /usr/local/cuda/extras/demo_suite/deviceQuery
/usr/local/cuda/extras/demo_suite/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla T4"
CUDA Driver Version / Runtime Version 10.1 / 10.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 15080 MBytes (15812263936 bytes)
(40) Multiprocessors, ( 64) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1590 MHz (1.59 GHz)
Memory Clock rate: 5001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 30
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.0, NumDevs = 1, Device0 = Tesla T4
Result = PASS
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ /usr/local/cuda/extras/demo_suite/deviceQuery
cuda/extras/demo_suite/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL
最佳答案
设置 c.SystemdSpawner.isolate_devices = False
在您的 jupyterhub_config.py
.
以下是 the documentation 的摘录:
Setting this to true provides a separate, private /dev for each user. This prevents the user from directly accessing hardware devices, which could be a potential source of security issues. /dev/null, /dev/zero, /dev/random and the ttyp pseudo-devices will be mounted already, so most users should see no change when this is enabled.
c.SystemdSpawner.isolate_devices = True
This requires systemd version > 227. If you enable this in earlier versions, spawning will fail.
/dev
中的文件)。请引用
their documentation for more information .那里应该有名为
/dev/nvidia*
的文件.使用 SystemdSpawner 隔离设备将阻止对这些 Nvidia 设备的访问。
Is there a way to still isolate the devices and enable GPU support?
c.SystemdSpawner.isolate_devices = True
套
PrivateDevices=yes
最终
systemd-run
调用(
source)。引用
the systemd documentation有关
PrivateDevices
的更多信息选项。
isolate_devices = True
然后显式安装 nvidia 设备。虽然我不知道该怎么做...
关于python - JupyterHub 单用户无法通过 systemdspawner 使用 tensorflow gpu 支持,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60187732/
我通过 spring ioc 编写了一些 Rest 应用程序。但我无法解决这个问题。这是我的异常(exception): org.springframework.beans.factory.BeanC
我对 TestNG、Spring 框架等完全陌生,我正在尝试使用注释 @Value通过 @Configuration 访问配置文件注释。 我在这里想要实现的目标是让控制台从配置文件中写出“hi”,通过
为此工作了几个小时。我完全被难住了。 这是 CS113 的实验室。 如果用户在程序(二进制计算器)结束时选择继续,我们需要使用 goto 语句来到达程序的顶部。 但是,我们还需要释放所有分配的内存。
我正在尝试使用 ffmpeg 库构建一个小的 C 程序。但是我什至无法使用 avformat_open_input() 打开音频文件设置检查错误代码的函数后,我得到以下输出: Error code:
使用 Spring Initializer 创建一个简单的 Spring boot。我只在可用选项下选择 DevTools。 创建项目后,无需对其进行任何更改,即可正常运行程序。 现在,当我尝试在项目
所以我只是在 Mac OS X 中通过 brew 安装了 qt。但是它无法链接它。当我尝试运行 brew link qt 或 brew link --overwrite qt 我得到以下信息: ton
我在提交和 pull 时遇到了问题:在提交的 IDE 中,我看到: warning not all local changes may be shown due to an error: unable
我跑 man gcc | grep "-L" 我明白了 Usage: grep [OPTION]... PATTERN [FILE]... Try `grep --help' for more inf
我有一段代码,旨在接收任何 URL 并将其从网络上撕下来。到目前为止,它运行良好,直到有人给了它这个 URL: http://www.aspensurgical.com/static/images/a
在过去的 5 个小时里,我一直在尝试在我的服务器上设置 WireGuard,但在完成所有设置后,我无法 ping IP 或解析域。 下面是服务器配置 [Interface] Address = 10.
我正在尝试在 GitLab 中 fork 我的一个私有(private)项目,但是当我按下 fork 按钮时,我会收到以下信息: No available namespaces to fork the
我这里遇到了一些问题。我是 node.js 和 Rest API 的新手,但我正在尝试自学。我制作了 REST API,使用 MongoDB 与我的数据库进行通信,我使用 Postman 来测试我的路
下面的代码在控制台中给出以下消息: Uncaught DOMException: Failed to execute 'appendChild' on 'Node': The new child el
我正在尝试调用一个新端点来显示数据,我意识到在上一组有效的数据中,它在数据周围用一对额外的“[]”括号进行控制台,我认为这就是问题是,而新端点不会以我使用数据的方式产生它! 这是 NgFor 失败的原
我正在尝试将我的 Symfony2 应用程序部署到我的 Azure Web 应用程序,但遇到了一些麻烦。 推送到远程时,我在终端中收到以下消息 remote: Updating branch 'mas
Minikube已启动并正在运行,没有任何错误,但是我无法 curl IP。我在这里遵循:https://docs.traefik.io/user-guide/kubernetes/,似乎没有提到关闭
每当我尝试docker组成任何项目时,都会出现以下错误。 我尝试过有和没有sudo 我在这台机器上只有这个问题。我可以在Mac和Amazon WorkSpace上运行相同的容器。 (myslabs)
我正在尝试 pip install stanza 并收到此消息: ERROR: No matching distribution found for torch>=1.3.0 (from stanza
DNS 解析看起来不错,但我无法 ping 我的服务。可能是什么原因? 来自集群中的另一个 Pod: $ ping backend PING backend.default.svc.cluster.l
我正在使用Hibernate 4 + Spring MVC 4当我开始 Apache Tomcat Server 8我收到此错误: Error creating bean with name 'wel
我是一名优秀的程序员,十分优秀!