gpt4 book ai didi

machine-learning - copy_initial_weights 文档在 Pytorch 的更高库中是什么意思?

转载 作者:行者123 更新时间:2023-12-04 11:16:43 25 4
gpt4 key购买 nike

我试图使用更高的库进行元学习,但我在理解 copy_initial_weights 的内容时遇到了问题。意思。文档说:

copy_initial_weights – if true, the weights of the patched module are copied to form the initial weights of the patched module, and thus are not part of the gradient tape when unrolling the patched module. If this is set to False, the actual module weights will be the initial weights of the patched module. This is useful when doing MAML, for example.



但这对我来说没有多大意义,因为以下几点:

例如,“修补模块的权重被复制以形成修补模块的初始权重”对我来说没有意义,因为当上下文管理器启动时,修补模块还不存在。所以不清楚我们从哪里复制什么(以及为什么复制是我们想要做的事情)。

此外,“展开修补模块”对我来说没有意义。我们通常展开由 for 循环引起的计算图。一个补丁模块只是一个被这个库修改过的神经网络。展开是模棱两可的。

此外,“渐变胶带”没有技术定义。

此外,在描述 false 是什么时,说它对 MAML 有用实际上并没有用,因为它甚至没有暗示为什么它对 MAML 有用。

总的来说,不可能使用上下文管理器。

以更精确的术语解释该标志的作用的任何解释和示例都将非常有值(value)。

有关的:
  • gitissue:https://github.com/facebookresearch/higher/issues/30
  • 新的 gitissue:https://github.com/facebookresearch/higher/issues/54
  • pytorch 论坛:https://discuss.pytorch.org/t/why-does-maml-need-copy-initial-weights-false/70387
  • pytorch 论坛:https://discuss.pytorch.org/t/what-does-copy-initial-weights-do-in-the-higher-library/70384
  • 与此相关的重要问题是如何复制 fmodel 参数以便优化器工作(以及使用深复制):Why does higher need to deep copy the parameters of the base model to create a functional model?
  • 最佳答案

    短版

    调用 higher.innerloop_ctxmodel作为参数为该模型创建临时修补模型和展开优化器:(fmodel, diffopt) .预计在内循环中 fmodel 将迭代地接收一些输入,计算输出和损失,然后 diffopt.step(loss)将被调用。每次diffopt.step被称为 fmodel将创建下一个版本的参数 fmodel.parameters(time=T)这是使用以前的张量计算的新张量(完整的图允许通过该过程计算梯度)。如果用户在任何时候拨打 backward在任何张量上,常规的 pytorch 梯度计算/累积将以允许梯度传播到例如的方式开始。优化器的参数(例如 lrmomentum - 如果它们作为需要梯度传递到 higher.innerloop_ctx 的张量使用 override )。
    fmodel 的创建时版本的参数 fmodel.parameters(time=0)是原件model的副本参数。如 copy_initial_weights=True提供(默认)然后 fmodel.parameters(time=0)将是 clone + detach 'ed 版本 model的参数(即会保留值,但会严重影响与原始模型的所有连接)。如 copy_initial_weights=False提供,然后 fmodel.parameters(time=0)将是 clone 'd 版本 model的参数,因此将允许梯度传播到原始 model的参数(参见 pytorch doc 上的 clone)。

    术语说明

  • 这里的梯度带指的是 pytorch 用于通过计算将梯度传播到所有需要梯度的叶张量的图。如果在某个时候你切断了一些需要参数的叶张量的链接(例如,对于 fnet.parameters(),对于 copy_initial_weights=True 是如何完成的),那么原始的 model.parameters()您的 meta_loss.backward() 将不再处于“梯度磁带”状态计算。
  • 这里展开补丁模块指的是meta_loss.backward()的部分pytorch 遍历所有时的计算 fnet.parameters(time=T)从最新开始到最早结束( higher 不控制过程 - 这只是常规的 pytorch 梯度计算,higher 只负责这些新的 time=T 参数是如何从以前的参数中创建的时间 diffopt.step 被调用以及 fnet 如何总是使用最新的进行前向计算)。

  • 长版

    让我们从头开始。 higher 的主要功能(只有功能,真的)库以可微分的方式展开模型的参数优化。它可以以直接使用可微优化器的形式出现,例如 higher.get_diff_optimthis example或以 higher.innerloop_ctx 的形式如 this example .

    带有 higher.innerloop_ctx 的选项正在包装“无状态”模型的创建 fmodel从现有模型为您提供并为您提供“优化器” diffopt为此 fmodel .因此,正如在更高版本的 README.md 中总结的那样,它允许您从以下位置切换:
    model = MyModel()
    opt = torch.optim.Adam(model.parameters())

    for xs, ys in data:
    opt.zero_grad()
    logits = model(xs)
    loss = loss_function(logits, ys)
    loss.backward()
    opt.step()


    model = MyModel()
    opt = torch.optim.Adam(model.parameters())

    with higher.innerloop_ctx(model, opt) as (fmodel, diffopt):
    for xs, ys in data:
    logits = fmodel(xs) # modified `params` can also be passed as a kwarg
    loss = loss_function(logits, ys) # no need to call loss.backwards()
    diffopt.step(loss) # note that `step` must take `loss` as an argument!

    # At the end of your inner loop you can obtain these e.g. ...
    grad_of_grads = torch.autograd.grad(
    meta_loss_fn(fmodel.parameters()), fmodel.parameters(time=0))

    训练的区别 model和做 diffopt.step更新 fmodelfmodel没有将参数就地更新为 opt.step()在原来的部分会做。而是每次 diffopt.step被称为参数的新版本是以这种方式创建的,即 fmodel将在下一步使用新的,但仍保留所有以前的。

    IE。 fmodel仅开头 fmodel.parameters(time=0)可用,但在您调用 diffopt.step 后可以问N次 fmodel给你 fmodel.parameters(time=i)对于任何 i高达 N包括的。请注意 fmodel.parameters(time=0)这个过程完全没有变化,只是每次 fmodel应用于某些输入,它将使用它当前拥有的最新版本的参数。

    现在,究竟是什么 fmodel.parameters(time=0) ?已创建 here并取决于 copy_initial_weights .如 copy_initial_weights==True然后 fmodel.parameters(time=0)clone 'd 和 detach 'ed 参数 model .否则他们只是 clone 'd,但不是 detach 'ed!

    这意味着当我们进行元优化步骤时,原始 model的参数实际上会累积梯度当且仅当 copy_initial_weights==False .在 MAML 中,我们要优化 model的起始权重,因此我们实际上确实需要从元优化步骤中获得梯度。

    我认为这里的问题之一是 higher缺乏更简单的玩具示例来演示正在发生的事情,而是急于展示更严肃的事情作为示例。因此,让我尝试填补这里的空白,并使用我能想到的最简单的玩具示例(具有 1 个权重的模型将输入乘以该权重)来演示正在发生的事情:
    import torch
    import torch.nn as nn
    import torch.optim as optim
    import higher
    import numpy as np

    np.random.seed(1)
    torch.manual_seed(3)
    N = 100
    actual_multiplier = 3.5
    meta_lr = 0.00001
    loops = 5 # how many iterations in the inner loop we want to do

    x = torch.tensor(np.random.random((N,1)), dtype=torch.float64) # features for inner training loop
    y = x * actual_multiplier # target for inner training loop
    model = nn.Linear(1, 1, bias=False).double() # simplest possible model - multiple input x by weight w without bias
    meta_opt = optim.SGD(model.parameters(), lr=meta_lr, momentum=0.)


    def run_inner_loop_once(model, verbose, copy_initial_weights):
    lr_tensor = torch.tensor([0.3], requires_grad=True)
    momentum_tensor = torch.tensor([0.5], requires_grad=True)
    opt = optim.SGD(model.parameters(), lr=0.3, momentum=0.5)
    with higher.innerloop_ctx(model, opt, copy_initial_weights=copy_initial_weights, override={'lr': lr_tensor, 'momentum': momentum_tensor}) as (fmodel, diffopt):
    for j in range(loops):
    if verbose:
    print('Starting inner loop step j=={0}'.format(j))
    print(' Representation of fmodel.parameters(time={0}): {1}'.format(j, str(list(fmodel.parameters(time=j)))))
    print(' Notice that fmodel.parameters() is same as fmodel.parameters(time={0}): {1}'.format(j, (list(fmodel.parameters())[0] is list(fmodel.parameters(time=j))[0])))
    out = fmodel(x)
    if verbose:
    print(' Notice how `out` is `x` multiplied by the latest version of weight: {0:.4} * {1:.4} == {2:.4}'.format(x[0,0].item(), list(fmodel.parameters())[0].item(), out[0].item()))
    loss = ((out - y)**2).mean()
    diffopt.step(loss)

    if verbose:
    # after all inner training let's see all steps' parameter tensors
    print()
    print("Let's print all intermediate parameters versions after inner loop is done:")
    for j in range(loops+1):
    print(' For j=={0} parameter is: {1}'.format(j, str(list(fmodel.parameters(time=j)))))
    print()

    # let's imagine now that our meta-learning optimization is trying to check how far we got in the end from the actual_multiplier
    weight_learned_after_full_inner_loop = list(fmodel.parameters())[0]
    meta_loss = (weight_learned_after_full_inner_loop - actual_multiplier)**2
    print(' Final meta-loss: {0}'.format(meta_loss.item()))
    meta_loss.backward() # will only propagate gradient to original model parameter's `grad` if copy_initial_weight=False
    if verbose:
    print(' Gradient of final loss we got for lr and momentum: {0} and {1}'.format(lr_tensor.grad, momentum_tensor.grad))
    print(' If you change number of iterations "loops" to much larger number final loss will be stable and the values above will be smaller')
    return meta_loss.item()

    print('=================== Run Inner Loop First Time (copy_initial_weights=True) =================\n')
    meta_loss_val1 = run_inner_loop_once(model, verbose=True, copy_initial_weights=True)
    print("\nLet's see if we got any gradient for initial model parameters: {0}\n".format(list(model.parameters())[0].grad))

    print('=================== Run Inner Loop Second Time (copy_initial_weights=False) =================\n')
    meta_loss_val2 = run_inner_loop_once(model, verbose=False, copy_initial_weights=False)
    print("\nLet's see if we got any gradient for initial model parameters: {0}\n".format(list(model.parameters())[0].grad))

    print('=================== Run Inner Loop Third Time (copy_initial_weights=False) =================\n')
    final_meta_gradient = list(model.parameters())[0].grad.item()
    # Now let's double-check `higher` library is actually doing what it promised to do, not just giving us
    # a bunch of hand-wavy statements and difficult to read code.
    # We will do a simple SGD step using meta_opt changing initial weight for the training and see how meta loss changed
    meta_opt.step()
    meta_opt.zero_grad()
    meta_step = - meta_lr * final_meta_gradient # how much meta_opt actually shifted inital weight value
    meta_loss_val3 = run_inner_loop_once(model, verbose=False, copy_initial_weights=False)

    meta_loss_gradient_approximation = (meta_loss_val3 - meta_loss_val2) / meta_step

    print()
    print('Side-by-side meta_loss_gradient_approximation and gradient computed by `higher` lib: {0:.4} VS {1:.4}'.format(meta_loss_gradient_approximation, final_meta_gradient))

    产生这个输出:
    =================== Run Inner Loop First Time (copy_initial_weights=True) =================

    Starting inner loop step j==0
    Representation of fmodel.parameters(time=0): [tensor([[-0.9915]], dtype=torch.float64, requires_grad=True)]
    Notice that fmodel.parameters() is same as fmodel.parameters(time=0): True
    Notice how `out` is `x` multiplied by the latest version of weight: 0.417 * -0.9915 == -0.4135
    Starting inner loop step j==1
    Representation of fmodel.parameters(time=1): [tensor([[-0.1217]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    Notice that fmodel.parameters() is same as fmodel.parameters(time=1): True
    Notice how `out` is `x` multiplied by the latest version of weight: 0.417 * -0.1217 == -0.05075
    Starting inner loop step j==2
    Representation of fmodel.parameters(time=2): [tensor([[1.0145]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    Notice that fmodel.parameters() is same as fmodel.parameters(time=2): True
    Notice how `out` is `x` multiplied by the latest version of weight: 0.417 * 1.015 == 0.4231
    Starting inner loop step j==3
    Representation of fmodel.parameters(time=3): [tensor([[2.0640]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    Notice that fmodel.parameters() is same as fmodel.parameters(time=3): True
    Notice how `out` is `x` multiplied by the latest version of weight: 0.417 * 2.064 == 0.8607
    Starting inner loop step j==4
    Representation of fmodel.parameters(time=4): [tensor([[2.8668]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    Notice that fmodel.parameters() is same as fmodel.parameters(time=4): True
    Notice how `out` is `x` multiplied by the latest version of weight: 0.417 * 2.867 == 1.196

    Let's print all intermediate parameters versions after inner loop is done:
    For j==0 parameter is: [tensor([[-0.9915]], dtype=torch.float64, requires_grad=True)]
    For j==1 parameter is: [tensor([[-0.1217]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    For j==2 parameter is: [tensor([[1.0145]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    For j==3 parameter is: [tensor([[2.0640]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    For j==4 parameter is: [tensor([[2.8668]], dtype=torch.float64, grad_fn=<AddBackward0>)]
    For j==5 parameter is: [tensor([[3.3908]], dtype=torch.float64, grad_fn=<AddBackward0>)]

    Final meta-loss: 0.011927987982895929
    Gradient of final loss we got for lr and momentum: tensor([-1.6295]) and tensor([-0.9496])
    If you change number of iterations "loops" to much larger number final loss will be stable and the values above will be smaller

    Let's see if we got any gradient for initial model parameters: None

    =================== Run Inner Loop Second Time (copy_initial_weights=False) =================

    Final meta-loss: 0.011927987982895929

    Let's see if we got any gradient for initial model parameters: tensor([[-0.0053]], dtype=torch.float64)

    =================== Run Inner Loop Third Time (copy_initial_weights=False) =================

    Final meta-loss: 0.01192798770078706

    Side-by-side meta_loss_gradient_approximation and gradient computed by `higher` lib: -0.005311 VS -0.005311

    关于machine-learning - copy_initial_weights 文档在 Pytorch 的更高库中是什么意思?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60311183/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com