- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我对这两个概念感到困惑:In-graph replication
和 Between-graph replication
阅读 Replicated training 时在 tensorflow 的官方 How-to 中。
In-graph replication. In this approach, the client builds a single tf.Graph that contains one set of parameters (in tf.Variable nodes pinned to /job:ps); ...
tf.Graph
s 在 Between-graph
replication
方法?如果是,对应的代码在哪Between-graph replication
上面链接中的例子,谁能提供一个 In-graph replication
实现(伪代码很好)并突出其主要Between-graph replication
的区别?Between-graph replication. In this approach, there is a separate client for each /job:worker task, typically in the same process as the worker task. Each client builds a similar graph containing the parameters (pinned to /job:ps as before using tf.train.replica_device_setter() to map them deterministically to the same tasks); and a single copy of the compute-intensive part of the model, pinned to the local task in /job:worker.
worker
中共享s:# Build model...
loss = ...
global_step = tf.Variable(0)
workers
? In-graph replication
支持多方面的培训Between-graph replication
为了In-graph replication
与方式基本相同最佳答案
首先,对于一些历史背景,“图内复制”是我们在 TensorFlow 中尝试的第一种方法,并没有达到许多用户要求的性能,因此更复杂的“图间”方法是当前执行分布式训练的推荐方法。更高级别的库,例如 tf.learn
使用“图之间”方法进行分布式训练。
要回答您的具体问题:
Does this mean there are multiple
tf.Graph
s in the between-graph replication approach? If yes, where are the corresponding codes in the provided examples?
tf.Graph
对于模型。通常每个进程都使用全局默认图(可通过 tf.get_default_graph()
访问)并且没有显式创建。tf.Graph
和多个共享相同底层图的 tf.Session
对象的单个 TensorFlow 进程,只要您为每个 session 配置了不同的 tf.ConfigProto.device_filters
选项,但这是不常见的设置。)While there is already a between-graph replication example in above link, could anyone provide an in-graph replication implementation (pseudocode is fine) and highlight its main differences from between-graph replication?
tf.learn
和 TF-Slim,隐藏了其中的一些问题,并希望我们将来可以提供更好的复制方案。)
Why do we say each client builds a similar graph, but not the same graph?
"/job:worker/task:0"
、 "/job:worker/task:1"
等)的图。首席 worker 可能会创建不是在非首席 worker 上创建(或由其使用)的其他操作。然而,在大多数情况下,这些图在逻辑上(即模设备分配)是相同的。Shouldn't it be multiple copies of the compute-intensive part of the model, since we have multiple workers?
Does the example in Replicated training support training on multiple machines, each of which has multiple GPUs?
# Build model...
loss = ...
关于graph - 分布式 tensorflow : the difference between In-graph replication and Between-graph replication,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38000514/
我是一名优秀的程序员,十分优秀!