gpt4 book ai didi

python - 即使指定了其他 gpu,skflow 也会在 gpu0 中分配内存

转载 作者:太空宇宙 更新时间:2023-11-04 05:36:13 25 4
gpt4 key购买 nike

我在 4 GPU Amazon 实例上遇到了这个问题,使用了一个简单的示例脚本:

import skflow
import tensorflow as tf
from sklearn import datasets

iris = datasets.load_iris()
X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target,
test_size=0.2, random_state=42)

def my_model(X, y):

with tf.device('/gpu:1'):
layers = skflow.ops.dnn(X, [1000, 500, 150], keep_prob=0.5) # many neurons to see the impac on memory
with tf.device('/cpu:0'):
return skflow.models.logistic_regression(layers, y)

classifier = skflow.TensorFlowEstimator(model_fn=my_model, n_classes=3)
classifier.fit(X_train, y_train)

启动脚本前nvidia-smi的结果是:

Fri Feb 19 11:30:22 2016       
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 40C P0 41W / 125W | 2247MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GRID K520 Off | 0000:00:04.0 Off | N/A |
| N/A 36C P0 40W / 125W | 2113MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GRID K520 Off | 0000:00:05.0 Off | N/A |
| N/A 41C P0 43W / 125W | 53MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GRID K520 Off | 0000:00:06.0 Off | N/A |
| N/A 39C P0 41W / 125W | 1816MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

当脚本运行时:

Fri Feb 19 11:30:53 2016       
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 40C P0 46W / 125W | 3926MiB / 4095MiB | 26% Default |
+-------------------------------+----------------------+----------------------+
| 1 GRID K520 Off | 0000:00:04.0 Off | N/A |
| N/A 37C P0 42W / 125W | 3926MiB / 4095MiB | 17% Default |
+-------------------------------+----------------------+----------------------+
| 2 GRID K520 Off | 0000:00:05.0 Off | N/A |
| N/A 41C P0 44W / 125W | 92MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GRID K520 Off | 0000:00:06.0 Off | N/A |
| N/A 39C P0 42W / 125W | 1856MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

因此内存分配给了 GPU0,即使代码中没有提到它。你知道这种行为从何而来吗?这会导致一个问题,因为我们在这个实例上有多个用户,即使没有人打算使用它,GPU0 也会饱和。

最佳答案

我们发现的解决方法是修改 skflow.TensorFlowEstimator

罪魁祸首是

with self._graph.as_default():
tf.set_random_seed(self.tf_random_seed)
self._global_step = tf.Variable(
0, name="global_step", trainable=False)

skflow.TensorFlowEstimator.setup_training() 中,我们将其修改为

with self._graph.as_default(), tf.device("/gpu:{0}".format(self.gpu_number)):
tf.set_random_seed(self.tf_random_seed)
self._global_step = tf.get_variable('global_step', [],
initializer=tf.constant_initializer(0), trainable=False)

在类中添加属性gpu_number,并在skflow.TensorFlowEstimator._setup_training() 中使用allow_soft_placement=True 初始化session

关于python - 即使指定了其他 gpu,skflow 也会在 gpu0 中分配内存,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35504528/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com