gpt4 book ai didi

tensorflow - 为什么 Tensorflow 中的分布策略不支持梯度裁剪?

转载 作者:行者123 更新时间:2023-12-03 20:51:22 25 4
gpt4 key购买 nike

使用分布策略似乎不支持梯度裁剪
https://github.com/tensorflow/tensorflow/blob/f9f6b4cec2a1bdc5781e4896d80cee1336a2fbab/tensorflow/python/keras/optimizer_v2/optimizer_v2.py#L383

("Gradient clipping in the optimizer ""(by setting clipnorm or clipvalue) is currently ""unsupported when using a distribution strategy.")


这有什么原因吗?我很想通过直接裁剪渐变来定义自定义 def _minimize(strategy, tape, optimizer, loss, trainable_variables):

最佳答案

GitHub 用户 tomerk wrote :

There's two possible places to clip when you have distributionstrategies enabled:

  • before gradients get aggregated (usually wrong)
  • after gradients get aggregated (usually right & what people expect)

We want it working w/ the second case (clipping after gradients areaggregated). The issue is the optimizers are written with clippinghappening in the code before aggregation does.

We looked into changing this, but it would have required either:

  • api changes that break existing users of optimizer apply_gradients/other non-minimize methods
  • changing the signatures of methods optimizer implementers need to implement, breaking existing custom optimizers

So rather than:

  • quietly doing clipping in the wrong place
  • increasing churn & breaking existing users or existing custom optimizers just for this individual feature

We instead decided to leave this disabled for now. We'll roll supportfor this into a larger optimizer refactoring that solves a larger setof issues.


这现在是 implemented

关于tensorflow - 为什么 Tensorflow 中的分布策略不支持梯度裁剪?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62619216/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com