gpt4 book ai didi

python - 警告 :tensorflow:Efficient allreduce is not supported for 1 IndexedSlices

转载 作者:行者123 更新时间:2023-12-04 15:22:16 26 4
gpt4 key购买 nike

使用函数式 API 运行 keras 多输入模型时出现此警告。该模型在单个 GPU 上运行时运行良好且没有警告。当我使用 tf.distribute.MirroredStrategy使用两个 GPU 模型的最终结果没问题,但我收到警告。我认为这会导致性能问题?

tf.__version__ : 2.2.0
tf.keras.__version__ : 2.3.0-tf
NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.1
我生成的模型是:
def build_model_():

input_a_size = 200
input_b_size = 4
num_classes = 2
len_embedding = 100

mirrored_strategy = tf.distribute.MirroredStrategy(['/gpu:0', '/gpu:1'])

with mirrored_strategy.scope():

input_a = Input(shape=(input_a_size,), name='input_a', dtype=np.uint8)
input_b = Input(shape=(input_b_size,), name='input_b', dtype=np.float32)

x = Embedding(len_embedding, 100)(input_a)
x = Conv1D(32, 4, activation='relu')(x)
x = Flatten()(x)
branch_a = Dense(64, activation='relu')(x)

x = Dense(32, activation='relu')(input_b)
branch_b = Dense(32, activation='relu')(x)

concat = Concatenate()([
branch_a,
branch_b,
])

x = Dense(256, activation = 'relu')(concat)
output = Dense(num_classes, activation='softmax')(x)

model = Model(inputs=[
input_a,
input_b,
],
outputs=[output])

model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

model.summary()

return model
型号概要:
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_a (InputLayer) [(None, 200)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 200, 100) 10000 input_a[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D) (None, 197, 128) 51328 embedding[0][0]
__________________________________________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 49, 128) 0 conv1d[0][0]
__________________________________________________________________________________________________
input_b (InputLayer) [(None, 4)] 0
__________________________________________________________________________________________________
flatten (Flatten) (None, 6272) 0 max_pooling1d[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 32) 160 input_b[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 64) 401472 flatten[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 32) 1056 dense_1[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 96) 0 dense[0][0]
dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 256) 24832 concatenate[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 2) 514 dense_3[0][0]
==================================================================================================
Total params: 489,362
Trainable params: 489,362
Non-trainable params: 0
__________________________________________________________________________________________________
我生成输入的方式:
input_a_train.shape: (35000, 200)
input_b_train.shape: (35000, 4)
y_train.shape: (35000, 2)

train_dataset = tf.data.Dataset.from_tensor_slices(({
"input_a": input_a_train,
"input_b": input_b_train,
}, y_train))
<TensorSliceDataset shapes: ({input_a: (200,), input_b: (4,)}, (2,)), types: ({input_a: tf.uint8, input_b: tf.float64}, tf.float32)>

val_dataset = tf.data.Dataset.from_tensor_slices(({
"input_a": input_a_val,
"input_b": input_b_val,
}, y_val))
<TensorSliceDataset shapes: ({input_a: (200,), input_b: (4,)}, (2,)), types: ({input_a: tf.uint8, input_b: tf.float64}, tf.float32)>

train_batches = train_dataset.padded_batch(128)
val_batches = val_dataset.padded_batch(128)
我在训练阶段收到警告,
history = my_model.fit(
x = train_batches,
epochs=3,
verbose = 1,
validation_data = val_batches,
)
这是输出:
Epoch 1/3
INFO:tensorflow:batch_all_reduce: 12 all-reduces with algorithm = nccl, num_packs = 1
WARNING:tensorflow:Efficient allreduce is not supported for 1 IndexedSlices
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:batch_all_reduce: 12 all-reduces with algorithm = nccl, num_packs = 1
WARNING:tensorflow:Efficient allreduce is not supported for 1 IndexedSlices
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
274/274 [==============================] - ETA: 0s - loss: 0.1857 - accuracy: 0.9324
...
这里有一个类似的问题: Efficient allreduce is not supported for 2 IndexedSlices但它没有答案。
编辑 1 (31/7/2020)
实现了本指南中所述的自定义训练循环:
https://www.tensorflow.org/tutorials/distribute/custom_training
https://www.tensorflow.org/guide/distributed_training#using_tfdistributestrategy_with_custom_training_loops
https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
多个 GPU 上的相同警告和相同行为。当我增加 GPU 数量时,性能会下降。在 1 个 GPU 上训练比在 2 个 GPU 上训练快,最坏的情况是使用 8 个 GPU。
我认为问题可能出在 keras.model.fit 方法中,但不是。
我的猜测是使用函数式 api 模型的多输入 keras 的输入数据格式有问题。

最佳答案

我使用 tf.distribute.experimental.MultiWorkerMirroredStrategy() 解决了它.此策略对处理 IndexedSlices 有更好的支持,链接如下:https://github.com/tensorflow/tensorflow/issues/41898#issuecomment-668786507 .此外,根据使用的 GPU 数量增加批量大小。
这个问题涵盖了问题,
https://github.com/tensorflow/tensorflow/issues/41898

physical_devices = tf.config.list_physical_devices('GPU') # 8 GPUs in my setup
tf.config.set_visible_devices(physical_devices[0:8], 'GPU') # Using all GPUs (default behaviour)
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()

BATCH_SIZE_PER_REPLICA = 1024
GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync

train_batches = train_dataset.batch(GLOBAL_BATCH_SIZE)

with strategy.scope():
model = build_model_()
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

history = model.fit(
x = train_batches,
epochs=10,
verbose = 1,
)

关于python - 警告 :tensorflow:Efficient allreduce is not supported for 1 IndexedSlices,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63034145/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com