tensorflow - 如何在 MonitoredTrainingSession 中获取 global

tensorflow - 如何在 MonitoredTrainingSession 中获取 global_step？

转载作者：行者123 更新时间：2023-12-05 07:38:18

25

4

我在分布式 TensorFlow 中运行分布式 mnist 模型。我想“手动”监视 global_step 的演变以进行调试。在分布式 TensorFlow 设置中获得全局步骤的最佳且干净的方法是什么？

下面是我的代码

 ...

with tf.device(device):
  images = tf.placeholder(tf.float32, [None, 784], name='image_input')
  labels = tf.placeholder(tf.float32, [None], name='label_input')
  data = read_data_sets(FLAGS.data_dir,
          one_hot=False,
          fake_data=False)
  logits = mnist.inference(images, FLAGS.hidden1, FLAGS.hidden2)
  loss = mnist.loss(logits, labels)
  loss = tf.Print(loss, [loss], message="Loss = ")
  train_op = mnist.training(loss, FLAGS.learning_rate)

hooks=[tf.train.StopAtStepHook(last_step=FLAGS.nb_steps)]

with tf.train.MonitoredTrainingSession(
    master=target,
    is_chief=(FLAGS.task_index == 0),
    checkpoint_dir=FLAGS.log_dir,
    hooks = hooks) as sess:


  while not sess.should_stop():
    xs, ys = data.train.next_batch(FLAGS.batch_size, fake_data=False)
    sess.run([train_op], feed_dict={images:xs, labels:ys})

      global_step_value = # ... what is the clean way to get this variable

最佳答案

通常一个好的做法是在图形定义过程中初始化全局步骤变量，例如global_step = tf.Variable(0, trainable=False, name='global_step')。然后，您可以使用 graph.get_tensor_by_name("global_step:0") 轻松获取全局步骤。

关于tensorflow - 如何在 MonitoredTrainingSession 中获取 global_step？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48022577/

25

4

0

文章推荐： azure，通过arm模板添加提供商

文章推荐： Python Queue 在 get() 后不会释放内存

文章推荐： apache-spark - 一次获取多个回归指标

python - MonitoredTrainingSession 每次运行写入多个元图事件
当使用 tf.train.MonitoredTrainingSession 编写检查点文件时，它会以某种方式写入多个元图。我做错了什么？我将其简化为以下代码: import tensorflow a
tensorflow - 如何在 MonitoredTrainingSession 中获取 global_step？
我在分布式 TensorFlow 中运行分布式 mnist 模型。我想“手动”监视 global_step 的演变以进行调试。在分布式 TensorFlow 设置中获得全局步骤的最佳且干净的方法是什么
tensorflow - 如何使用 tf.MonitoredTrainingSession 在训练和验证数据集之间切换？
我想用feedable tensorflow Dataset API 中的迭代器设计，因此我可以在一些训练步骤后切换到验证数据。但是如果我切换到验证数据，它将结束整个 session 。以下代码演示
tensorflow - 脚手架和 tf.train.MonitoredTrainingSession
我想知道如何将 Scaffold 与 tf.train.MonitoredTrainingSession 一起使用，并使用来自 Numpy 数组的特定导入值初始化图形权重。我找不到任何类似用途的明确示
python - tf.train.MonitoredTrainingSession 参数
config=None 接受 tf.train.MonitoredTrainingSession 中的哪些参数？。如何使用正确的语法指定主节点(例如 localhost:2222)？下面是我使用 c
tensorflow - 如何使用 tf.train.MonitoredTrainingSession 仅恢复某些变量
如何告诉 tf.train.MonitoredTrainingSession 仅恢复变量的一个子集，并对其余变量执行初始化？从 cifar10 教程开始.. https://github.com/t
python - 基本的 StopAtStepHook 和 MonitoredTrainingSession 用法
我想设置分布式 tensorflow 模型，但无法理解 MonitoredTrainingSession 和 StopAtStepHook 的交互方式。在我进行此设置之前: for epoch in
python - 防止为 MonitoredTrainingSession 分配 GPU 内存
我正在尝试限制 MonitoredTrainingSession 中的 GPU 内存分配。 tf.GPUOptions的设置方法如下:How to prevent tensorflow from al
python - 使用 tf.train.MonitoredTrainingSession 时如何获取全局步骤
当我们在Saver.save中指定global_step时，它会将global_step存储为checkpoint后缀。 # save the checkpoint saver = tf.train.
tensorflow - 如何使用来自 Tensorflow Dataset API 的可馈送迭代器和 MonitoredTrainingSession？
Tensorflow programmer's guide建议使用可馈送迭代器在训练和验证数据集之间切换，而无需重新初始化迭代器。主要是需要进给 handle 在它们之间进行选择。如何与它一起使用
python - 带有 SyncReplicasOptimizer Hook 的 MonitoredTrainingSession 不能用占位符初始化
我使用 tf.keras.Input 作为输入层构建我的网络。 input_image = tf.keras.Input(shape=(None, None, 3), name='input_imag
tensorflow - tf.train.MonitoredTrainingSession 和来自 Dataset 的可重新初始化的迭代器
似乎 MonitoredTrainingSession 在第一次调用 .run(..) 之前做了一些操作(记录？)，这意味着当我这样做时: train_data = reader.traindata(
python - 在 TensorFlow 中使用 MonitoredTrainingSession 与 Estimator 的原因是什么
我看到很多使用 MonitoredTrainingSession 或 tf.Estimator 作为训练框架的示例。然而，目前尚不清楚为什么我会使用其中一种而不是另一种。两者都可以通过 Session
python - `MonitoredTrainingSession()`如何与 "restore"和 "testing mode"一起使用？
在Tensorflow中，我们可以使用Between-graph Replication构建和创建多个Tensorflow session 以进行分布式培训。 MonitoredTrainingSes
tensorflow - tf.train.MonitoredTrainingSession 和 tf.train.Supervisor 有什么区别
我想知道这两个 tensorflow 对象在用于训练神经网络时有什么区别？最佳答案 Supervisor 即将被弃用，鼓励新用户使用 tf.train.FooSession 类 (来自 commen
python - DeepLab tensorflow : TypeError: MonitoredTrainingSession() got an unexpected keyword argument 'summary_dir'
我正在运行this DeepLab example在提供的 conda 环境中的深度学习 AMI (ubuntu) ec2 实例上 tensorflow_p36。来自~/models/researc

首页

博学

6Ren·AI

商城

tensorflow - 如何在 MonitoredTrainingSession 中获取 global_step？