pytorch - AdamW 和 Adam 的权重衰减-6ren

pytorch - AdamW 和 Adam 的权重衰减

转载作者：行者123 更新时间：2023-12-03 16:15:22

35

4

torch.optim.Adam(weight_decay=0.01)有什么区别吗和 torch.optim.AdamW(weight_decay=0.01) ?
链接到文档:torch.optim

最佳答案

是的，Adam 和 AdamW 的权重衰减是不同的。

Hutter pointed out in their paper (Decoupled Weight Decay Regularization) that the way weight decay is implemented in Adam in every library seems to be wrong, and proposed a simple way (which they call AdamW) to fix it.

在 Adam 中，权重衰减通常是通过添加 wd*w 来实现的。 ( wd 在这里是权重衰减)到梯度(第一种情况)，而不是实际从权重中减去(第二种情况)。

# Ist: Adam weight decay implementation (L2 regularization)
final_loss = loss + wd * all_weights.pow(2).sum() / 2
# IInd: equivalent to this in SGD
w = w - lr * w.grad - lr * wd * w

These methods are same for vanilla SGD, but as soon as we add momentum, or use a more sophisticated optimizer like Adam, L2 regularization (first equation) and weight decay (second equation) become different.

AdamW 遵循权重衰减的第二个方程。
在亚当

weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

在亚当W

weight_decay (float, optional) – weight decay coefficient (default: 1e-2)

在 fastai blog 上阅读更多信息.

关于pytorch - AdamW 和 Adam 的权重衰减，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64621585/

35

4

0

文章推荐： python - 楼层划分如何根据文件规定不给出结果？

文章推荐： swift3 - swift : RxSwift's asObservable() method and type erasure

文章推荐： cocoa - 消耗性订阅和不可更新订阅之间的实际区别

wifi基础(一)：无线电波与WIFI信号干扰、衰减
liwen01 2024.08.18 前言无论是在产品开发还是在日常生活中，在使用无线网络的时候，都会经常遇到一些信号不好的问题，也会产生不少疑问：为什么我们在高速移动的高铁上网络会变
unity3d - 衰减 Kinect 中的位置和方向
我正在使用 Kinect 获取每个关节的位置和方向，然后将它们发送到 Unity。我注意到值有很多“跳跃”或波动，例如，有时我不移动我的手，而在 Unity 中它会旋转 180 度。我想要的是一个平
c++ - 当从仿函数类模板参数推导出(衰减)类型时，完美转发失败。为什么？
在下面的示例中， #include #include //okay: // template decltype(auto) runner(T&& t, F f) { return f(st
python - 尽管 SGD 衰减，Keras 学习率没有改变
出于某种原因，即使我设置了衰减因子，我的学习率似乎也没有改变。我添加了一个回调来查看学习率，它似乎在每个纪元之后都是一样的。为什么没有变化 class LearningRatePrinter(Call
c++ - Visual Studio 2012 更新 2 中的 std::async 衰减(丢失)右值引用。任何解决方法？
考虑下面的代码: #include #include using namespace std; template void Test2(future f, Work w) { async([

首页

博学

6Ren·AI

商城

pytorch - AdamW 和 Adam 的权重衰减