gpt4 book ai didi

python - 如何配置tensorflow legacy/train.py model.cpk输出间隔

转载 作者:太空狗 更新时间:2023-10-29 21:57:43 24 4
gpt4 key购买 nike

我正在尝试解决由模型过度拟合引起的问题。不幸的是,我不知道如何增加 legacy/train.py 在训练期间输出的 model.cpk 的间隔。有没有办法减少每次保存 model.cpk 之间的时间并禁用其删除。我正在训练小型模型,可以承受增加的存储需求。

最佳答案

有关保存间隔和要保留的检查点数量,请查看此处: https://www.tensorflow.org/api_docs/python/tf/train/Saver

来自上面的链接
-> max_to_keep
-> keep_checkpoint_every_n_hours

Additionally, optional arguments to the Saver() constructor let you control the proliferation of checkpoint files on disk:

max_to_keep indicates the maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If None or 0, no checkpoints are deleted from the filesystem but only the last one is kept in the checkpoint file. Defaults to 5 (that is, the 5 most recent checkpoint files are kept.)

keep_checkpoint_every_n_hours: In addition to keeping the most recent max_to_keep checkpoint files, you might want to keep one checkpoint file for every N hours of training. This can be useful if you want to later analyze how a model progressed during a long training session. For example, passing keep_checkpoint_every_n_hours=2 ensures that you keep one checkpoint file for every 2 hours of training. The default value of 10,000 hours effectively disables the feature.

我相信如果你使用一个,你可以在训练配置中引用它。 checkout 同一旧目录中的 trainer.py 文件。在第 375 行附近,它引用了 keep_checkpoint_every_n_hours ->

# Save checkpoints regularly.
keep_checkpoint_every_n_hours = train_config.keep_checkpoint_every_n_hours
saver = tf.train.Saver(keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours)

它没有引用的是可能需要添加到该脚本的 max_to_keep 行。也就是说,最后,虽然没有所有信息很难确定,但我不禁认为你正在以错误的方式处理这件事。收集每个检查点并进行审查似乎不是处理过拟合的正确方法。运行 tensorboard 并在那里检查你的训练结果。此外,使用带有评估数据的模型进行一些评估也可以深入了解您的模型正在做什么。

祝你训练顺利!

关于python - 如何配置tensorflow legacy/train.py model.cpk输出间隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54212645/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com