python - TensorFlow 如何将构建与执行分开以实现数据并行-6ren

python - TensorFlow 如何将构建与执行分开以实现数据并行

转载作者：太空宇宙更新时间：2023-11-04 04:33:08

在 TensorFlow 中，模块通常封装在函数或类中，并抽象创建所需的变量以调用它们

net = slim.fully_connected(inputs=input, num_output=neurons ..)
net = tf.layers.conv2d(net, num_filters, filter_size ..)

这里将首先创建每个操作的权重和偏差，然后重复使用。

当我们想要实现数据并行时，我们希望变量被创建并存储在 CPU 上，然后与数据一起发送到 GPU，如下图所示

在cifar10_multi_gpu_train例如，您可以看到它们不使用 tf.layers 如果您检查同一目录上的 cifar10.py，您会看到它们使用较低级别的操作fully_connected 和 conv2d 并在 CPU 上手动创建内核、权重和偏差。如果我们想要使用已经实现的复杂结构以便于在 TensorFlow 中使用，这可能会非常麻烦。

我的问题是:我们可以使用高级模块抽象(来自 slim/tf.layers 和其他抽象变量创建的模块)吗？在 CPU 上创建但操作将在 GPU 上执行？

最佳答案

编辑:

关于将变量固定到 CPU，您可以使用 tf.device 来实现和设备功能。在分布式环境中，您有 tf.train.replica_device_setter ，但是对于本地情况很容易做类似的事情:

import tensorflow as tf

def my_device_placement(device, vars_device='/cpu:0'):
    # Ops to pin on the CPU
    VAR_TYPES = ['Variable', 'VariableV2', 'VarHandleOp']
    def device_function(op):
        return vars_device if op.type in VAR_TYPES else device
    return device_function

def conv2d_replica(input_, filters, kernel_size, name, device, is_first_replica):
    with tf.device(my_device_placement(device)):
        return tf.layers.conv2d(input_, filters, kernel_size, name=name, reuse=not is_first_replica)

inp = tf.placeholder(tf.float32, [None, 100, 100, 3])
lyr1 = conv2d_replica(inp, 5, [20, 20], 'Layer', '/gpu:0', True)
lyr2 = conv2d_replica(inp, 5, [20, 20], 'Layer', '/gpu:1', False)
print('Device of first replica:', lyr1.device)
print('Device of second replica:', lyr2.device)
print('Variable devices:')
for var in tf.trainable_variables():
    print(var.name, var.device)

输出:

Device of first replica: /gpu:0
Device of second replica: /gpu:1
Variable devices:
Layer/kernel:0 /cpu:0
Layer/bias:0 /cpu:0

应该在 CPU 上执行的操作由您决定。您可以查看 python/training/device_setter.py 中的 STANDARD_PS_OPS看看 TensorFlow 认为什么是固定到参数服务器的标准操作集(在这种情况下它是本地的，但想法是相似的)。

通过 tf.layers，您可以使用 name 和 reuse 参数。当 reuse=True 时，该层将使用先前创建的具有相同 name 的层的权重。请注意，这意味着您第一次创建层 reuse 应该是 False:

import tensorflow as tf

inp = tf.placeholder(tf.float32, [None, 100, 100, 3])
lyr1 = tf.layers.conv2d(inp, 5, [20, 20], name='Layer', reuse=False)
lyr2 = tf.layers.conv2d(inp, 5, [20, 20], name='Layer', reuse=True)

图表:

这里的 BiasAdd 节点是层的输出。权重在同一层中创建并在第二层中重复使用。

请注意，这甚至适用于命名空间(我不确定这是否有意为之，因为我没有找到关于它的明确文档):

import tensorflow as tf

inp = tf.placeholder(tf.float32, [None, 100, 100, 3])
with tf.name_scope('Replica1'):
    lyr1 = tf.layers.conv2d(inp, 5, [20, 20], name='Layer', reuse=False)
with tf.name_scope('Replica2'):
    lyr2 = tf.layers.conv2d(inp, 5, [20, 20], name='Layer', reuse=True)

图表:

注意:尽管如今它基本上已被弃用，但 tf.slim 似乎也提供了相同的功能。在这种情况下，还有一个 reuse 参数，然后是一个用于变量作用域的 scope 参数，所以它会是这样的:

import tensorflow as tf

inp = tf.placeholder(tf.float32, [None, 10])
with tf.variable_scope('Layer') as scope:
    lyr1 = tf.contrib.slim.fully_connected(inp, 5, reuse=False, scope=scope)
with tf.variable_scope('Layer') as scope:
    lyr2 = tf.contrib.slim.fully_connected(inp, 5, reuse=True, scope=scope)

关于python - TensorFlow 如何将构建与执行分开以实现数据并行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52310022/