python - 为什么我在 Keras 中使用 multi_gpu_model 的训练速度比单 gpu 差？-6ren

python - 为什么我在 Keras 中使用 multi_gpu_model 的训练速度比单 gpu 差？

转载作者：太空宇宙更新时间：2023-11-03 11:42:23

24

4

我的 Keras 版本是 2.0.9，后端使用的是 tensorflow。

我尝试执行 multi_gpu_model在喀拉斯。然而，在实践中，使用 4 个 gpu 进行训练甚至比使用 1 个 gpu 还差。我得到 1 个 gpu 的 25 秒和 4 个 gpu 的 50 秒。你能告诉我发生这种情况的原因吗？

/multi_gpu_model 博客

https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/

我用这个推荐给 1 个 gpu

CUDA_VISIBLE_DEVICES=0 python gpu_test.py

对于 4 个 GPU，

python gpu_test.py

-这里是训练的源代码。

from keras.datasets import mnist
from keras.layers import Input, Dense, merge
from keras.layers.core import Lambda
from keras.models import Model
from keras.utils import to_categorical
from keras.utils.training_utils import multi_gpu_model
import time

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

inputs = Input(shape=(784,))

x = Dense(4096, activation='relu')(inputs)
x = Dense(2048, activation='relu')(x)
x = Dense(512, activation='relu')(x)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)
'''
m_model = multi_gpu_model(model, 4)
m_model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
m_model.summary()
a=time.time()
m_model.fit(x_train, y_train, batch_size=128, epochs=5)
print time.time() - a
a=time.time()
m_model.predict(x=x_test, batch_size=128)
print time.time() - a
'''
model.compile(optimizer='rmsprop',
          loss='categorical_crossentropy',
          metrics=['accuracy'])
model.summary()
a=time.time()
model.fit(x_train, y_train, batch_size=128, epochs=5)
print time.time() - a
a=time.time()
model.predict(x=x_test, batch_size=128)
print time.time() - a

And this is gpu state with running 4 gpus.

最佳答案

我可以给你我认为的答案，但我自己并没有完全发挥作用。 bug report 告诉我这个, 但在 source code for multi_gpu_model它说:

    # Instantiate the base model (or "template" model).
    # We recommend doing this with under a CPU device scope,
    # so that the model's weights are hosted on CPU memory.
    # Otherwise they may end up hosted on a GPU, which would
    # complicate weight sharing.
    with tf.device('/cpu:0'):
        model = Xception(weights=None,
                         input_shape=(height, width, 3),
                         classes=num_classes)

我认为这是问题所在。不过，我仍在努力让它发挥作用。

关于python - 为什么我在 Keras 中使用 multi_gpu_model 的训练速度比单 gpu 差？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47090096/

24

4

0

文章推荐： c# - QueryString containing\coming through with\\

文章推荐： python - 实现 copy.deepcopy() 克隆函数

文章推荐： python - 在 TensorFlow 中再现 scikit-learn 的 MLPClassifier

python - 运行 keras multi_gpu_model 时预测出错
我在 Google Cloud Platform 实例上运行 Keras 模型时遇到问题。模型如下: n_timesteps, n_features, n_outputs = train_x.sha
python - 无法从 keras.utils 导入 multi_gpu_model
我在 ubuntu 16.04 上有 tensorflow-gpu 1.2.1 和 keras。我无法执行: from kears.utils import multi_gpu_model 是否有人
python - 在 Keras 中使用 multi_gpu_model 恢复训练
我正在 Keras 中使用 multi_gpu_model 训练修改后的 InceptionV3 模型，我使用 model.save 保存整个模型。然后我关闭并重新启动了 IDE，并使用 load_
python - 断言错误: Could not compute output Tensor when using multi_gpu_model() in Keras
我有 2 个 Keras 子模型( model_1 、 model_2 )，从中我形成了完整的 model使用keras.models.Model()通过将它们逻辑地堆叠在“系列”中。我的意思是mod
python - 为什么我在 Keras 中使用 multi_gpu_model 的训练速度比单 gpu 差？
我的 Keras 版本是 2.0.9，后端使用的是 tensorflow。我尝试执行 multi_gpu_model在喀拉斯。然而，在实践中，使用 4 个 gpu 进行训练甚至比使用 1 个 gpu
tensorflow - 无法在 Keras 中使用跟随 multi_gpu_model 的 model.save 保存模型
升级到 Keras 2.0.9 后，我一直在使用 multi_gpu_model实用程序，但我无法使用以下方法保存我的模型或最佳权重 model.save('path') 我得到的错误是 TypeEr
python - Keras multi_gpu_model 错误 : "swig/python detected a memory leak of type ' int64_t *', no destructor found"
我正在使用 tensorflow 1.5.0、tensorflow-gpu 1.3.0、keras 2.0.9、keras-gpu 2.1.4 我使用这段代码创建了我的模型: inputs = Inp

首页

博学

6Ren·AI

商城

python - 为什么我在 Keras 中使用 multi_gpu_model 的训练速度比单 gpu 差？