gpt4 book ai didi

python - 全局步长不从 0 开始

转载 作者:行者123 更新时间:2023-11-30 09:33:46 25 4
gpt4 key购买 nike

我有一个使用 MonitoredTrainingSession 进行分布式计算的 RNN。我使用 global_step 来确定每个工作人员应该采用哪一批输入数据。

我在创建 session 之前已经定义了张量

global_step_tensor = tf.Variable(0, dtype=tf.int32, trainable=False, name=‘global_step’)
...
minimise = optimiser.minimize(loss, name=‘adam_opt’, global_step=‘global_step’)
with tf.train.MonitoredTrainingSession(...) as sess:
graph=tf.get_default_graph()
curr_step=sess.run(global_step_tensor)
print(curr_step) #gives 366

我认为变量仅在优化器评估时增加?为什么从 366 开始?

编辑 我的集群定义为一个 ps 和两个worker。目前,在我测试时,所有三个都通过不同端口在同一主机上运行。

最佳答案

根据文档,MonitoredTrainingSession有几个自动创建检查点的默认参数:

save_checkpoint_secs: The frequency, in seconds, that a checkpoint is saved using a default checkpoint saver. If save_checkpoint_secs is set to None, then the default checkpoint saver isn't used.

save_summaries_steps: The frequency, in number of global steps, that the summaries are written to disk using a default summary saver. If both save_summaries_steps and save_summaries_secs are set to None, then the default summary saver isn't used. Default 100.

save_summaries_secs: The frequency, in secs, that the summaries are written to disk using a default summary saver. If both save_summaries_steps and save_summaries_secs are set to None, then the default summary saver isn't used. Default not enabled.

也许这就是为什么您当前的批处理不再是 0 的原因。

关于python - 全局步长不从 0 开始,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49730000/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com