- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在使用 GANEstimator 和 MirroredStrategy 来处理单个实例的多个 GPU。 input_fn
在我的情况下是 tf.data.Dataset
使用以下设置:
dataset = dataset.repeat()
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(self.batch_size, drop_remainder=True)
dataset = dataset.prefetch(100)
dataset.shard()
的内容吗?手动将不同的数据传递给 worker ?我正在挖掘
Estimator 的代码, 和
MirroredStrategy ,但我不清楚发生了什么。额外的混淆来自
description of distributed strategies :
MirroredStrategy: This does in-graph replication with synchronous
training on many GPUs on one machine. Essentially, we create copies of all
variables in the model's layers on each device. We then use all-reduce
to combine gradients across the devices before applying them
to the variables to keep them in sync.
CollectiveAllReduceStrategy: This is a version of MirroredStrategy
for multi-worker training.
def create_dataset():
...
dataset = dataset.repeat()
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(self.batch_size, drop_remainder=True)
dataset = dataset.prefetch(100)
return dataset
NUM_GPUS = 4
strategy = tf.contrib.distribute.MirroredStrategy(num_gpus=NUM_GPUS)
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.01, use_locking=True)
optimizer_d = tf.train.RMSPropOptimizer(learning_rate=0.01, use_locking=True)
config = tf.estimator.RunConfig(save_checkpoints_steps=100,
save_summary_steps=1, keep_checkpoint_max=50,
train_distribute=strategy)
# I have more hooks here, just simplified to show
def get_hooks_fn(GANTrainOps):
disjoint_train_hook_func = tfgan.get_sequential_train_hooks(
train_steps=tfgan.GANTrainSteps(10, 1)
) # g steps, d steps
disjoint_train_hooks = disjoint_train_hook_func(GANTrainOps)
return [update_hook, summary_hook] + disjoint_train_hooks
# Create GAN estimator.
gan_estimator = tfgan.estimator.GANEstimator(
model_dir = '/data/checkpoints/estimator_model',
generator_fn = generator_fn,
discriminator_fn = discriminator_fn,
generator_loss_fn = generator_loss_fn,
discriminator_loss_fn = discriminator_loss_fn,
generator_optimizer = optimizer,
discriminator_optimizer = optimizer_d,
use_loss_summaries=True,
config=config,
get_hooks_fn=get_hooks_fn)
gan_estimator.train(input_fn=create_dataset, steps=10000)
The multi-worker version of this class maps one replica to one device on a worker. It mirrors all model variables on all replicas. For example, if you have two
worker
s and eachworker
has 4 GPUs, it will create 8 copies of the model variables on these 8 GPUs. Then like in MirroredStrategy(???), each replica performs their computation with their own copy of variables unless in cross-replica model where variable or tensor reduction happens.
auto_shard_dataset: whether to auto-shard the dataset when there are multiple workers.
tf.estimator.train()
一段时间后指向似乎是
strategy.make_input_fn_iterator()
:
def _get_iterator_from_input_fn(self, input_fn, mode, distribution=None):
if distribution is not None:
iterator = distribution.make_input_fn_iterator(
lambda _: self._call_input_fn(input_fn, mode))
input_hooks = [
estimator_util.DistributedIteratorInitializerHook(iterator)]
else:
result = self._call_input_fn(input_fn, mode)
iterator = result.make_initializable_iterator()
input_hooks = [estimator_util._DatasetInitializerHook(iterator)]
return iterator, input_hooks
make_input_fn_iterator()
make_input_fn_iterator
在我使用 grep 的 tensorflow 1.12.0 发行版中。似乎它在代码中完全不存在。
最佳答案
好的,花一些时间研究了github,发现已经和我的tf 1.12.0不一样了。所以,进入 1.12.0 的本地文件给了我:
GANEstimator 继承了 tf.python.estimator.Estimator
Estimator.init():
# The distribute field contains an instance of DistributionStrategy.
self._train_distribution = self._config.train_distribute
tf.contrib.gan.GANEstimator -> tf.python.estimator.Estimator.train() -->
tf.python.estimator.Estimator._train_model(input_fn, hooks, saving_listeners) -->
._train_model_distributed(input_fn, hooks, saving_listeners) -->
._get_iterator_from_input_fn(input_fn, model_fn_lib.ModeKeys.TRAIN, self._train_distribution) -->
distribution.distribute_dataset(lambda: self._call_input_fn(input_fn, mode))
MirrorredStrategy.distribute_dataset():
def distribute_dataset(self, dataset_fn):
if self._cluster_spec:
return values.MultiWorkerDataset(
partial(self._call_dataset_fn, dataset_fn), self._worker_device_map,
self._prefetch_on_device, self._auto_shard_dataset)
else:
return values.PerDeviceDataset(
self._call_dataset_fn(dataset_fn), self._devices,
self._prefetch_on_device)
tensorflow/python/training/distribute.py
:
def _call_dataset_fn(self, dataset_fn):
result = dataset_fn()
if not isinstance(result, dataset_ops.Dataset):
raise ValueError(
"dataset_fn() must return a tf.data.Dataset when using a "
"DistributionStrategy.")
return result
PerDeviceDataset
使用了,所以最后我在
values.py
中找到了这两个类:
class PerDeviceDataset(object):
"""Like `tf.data.Dataset` split devices, producing `PerDevice` data."""
def __init__(self, dataset, devices, prefetch_on_device=None):
self._devices = devices
# Default to using prefetching in graph mode, unless specified.
# TODO(priyag): Enable prefetching in eager mode.
self._prefetch_on_device = prefetch_on_device
if self._prefetch_on_device is None:
self._prefetch_on_device = not context.executing_eagerly()
assert not (self._prefetch_on_device and context.executing_eagerly()), (
"Prefetching is only supported in graph mode currently")
if self._prefetch_on_device:
self._dataset = dataset.apply(
prefetching_ops_v2.prefetch_to_devices(self._devices))
else:
# TODO(priyag): If dropping remainder is not appropriate, find another
# approach to distributing the dataset when not possible to divide evenly.
# Possibly not an issue when we start using PartitionedDataset.
self._dataset = dataset.batch(len(devices), drop_remainder=True)
def make_one_shot_iterator(self):
"""Get a one time use iterator for the distributed PerDeviceDataset."""
dataset_iterator = self._dataset.make_one_shot_iterator()
return PerDeviceDataIterator(dataset_iterator, self._devices,
self._prefetch_on_device)
def make_initializable_iterator(self):
"""Get an initializable iterator for the distributed PerDeviceDataset."""
dataset_iterator = self._dataset.make_initializable_iterator()
return PerDeviceDataIterator(dataset_iterator, self._devices,
self._prefetch_on_device)
class PerDeviceDataIterator(object):
"""An iterator (like `tf.data.Iterator`) into a `PerDeviceDataset`."""
def __init__(self, iterator, devices, prefetch_on_device=None):
self._iterator = iterator
self._devices = devices
self._prefetch_on_device = prefetch_on_device
@property
def initializer(self):
return self._iterator.initializer
def get_next(self, name=None):
"""Scatter the input across devices."""
if self._prefetch_on_device:
data_list = self._iterator.get_next(name=name)
index = dict(zip(self._devices, data_list))
else:
batch = self._iterator.get_next(name=name)
index = {}
def get_ith(i):
return lambda x: x[i]
for i, d in enumerate(self._devices):
index[d] = nest.map_structure(get_ith(i), batch)
if context.executing_eagerly():
with ops.device(d):
index[d] = nest.map_structure(array_ops.identity, index[d])
return regroup(index)
dataset_fn()
只是调用函数来获取数据集对象,然后在其上应用大小为 GPU 数量的批处理。该批次的元素必须是在我的数据集初始化中定义的实际批次
dataset_fn()
分配给不同的设备。
关于tensorflow - 使用 MirroredStrategy 时,tensorflow Estimator 是否为工作人员采取不同的批处理?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54327610/
我的类有一个 foo 方法和一个 main 方法,其中有一些变量和一个 print 语句。 public static boolean foo(int x, boolean b) { if (
我正在尝试对每几列取行平均值。这是一个示例数据集。 d = {'2000-01': range(0,10), '2000-02': range(10,20), '2000-03': range(10,
在 Nsight Visual Studio 中,我们将有一个图表来呈现“已采取”、“未采取”和“分歧”分支的统计信息。我对“不采取”和“分歧”之间的区别感到困惑。例如 kernel() { if
在 Nsight Visual Studio 中,我们将有一个图表来呈现“已采取”、“未采取”和“分歧”分支的统计信息。我对“不采取”和“分歧”之间的区别感到困惑。例如 kernel() { if
int main() { long int i,t,n,q[500],d[500],s[500],res[500]={0},j,h; scanf("%ld",&t); whil
我在 Linux 上使用 racket v6.5 repl 并尝试运行流教程中的 take 函数示例 https://docs.racket-lang.org/functional-data-stru
tl;博士无法在 ggpairs 中获得独立的图例(描述整个情节的常用颜色)令我满意。 对不起,长度。 我正在尝试使用 GGally::ggpairs 绘制(下三角形)对图(用于绘制各种绘图矩阵的扩展
几个月前我问过this question 。我想添加一个具有不同背景的相同 div。我想知道为什么 jQuery 在第二个 div 中不起作用?我发现仅当我单击第二个 div 中的小图像时,图像才会在
引用Performing a right join in django ,当我尝试类似的方法时(字段略有不同): class Student: user = ForeignKey(User)
所以我使用带有 Action Sheet 样式的 UIAlertController 来显示两个选项,一个用于取消操作,另一个用于删除数据。按钮工作正常,删除按钮工作,操作表关闭。我的问题是,在后台从
我有一个列表,其中每个单元格都是一个可放置的对象,可以接受某个类的可拖动对象。该表的边框是可见的,但我不希望固定大小的单元格着色且可见,这对我来说很难看。当我拖动一个可拖动对象与一个单元格相交时,该单
我有一个 RDD,它是通过读取一个大小约为 117MB 的本地文本文件形成的。 scala> rdd res87: org.apache.spark.rdd.RDD[String] = MapPart
如果我们有 n 级台阶并且我们可以一次上 1 或 2 级台阶,则台阶数和攀登台阶的方式之间存在斐波那契关系。当且仅当我们不认为 2+1 和 1+2 不同。 但是,情况不再如此,我们还必须添加第三个选项
var query = from ch in Client.wcf.context.CashHeading where ch.Id_customer == customern//cc.Id
我是一名优秀的程序员,十分优秀!