gpt4 book ai didi

python - 如何通过张量有效地使用 PyTorch 的 autograd?

转载 作者:行者123 更新时间:2023-12-04 03:33:55 25 4
gpt4 key购买 nike

在我之前的 question我找到了如何使用 PyTorch 的 autograd 进行区分。它起作用了:

#autograd
import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module):
def __init__(self):
super(net_x, self).__init__()
self.fc1=nn.Linear(1, 20)
self.fc2=nn.Linear(20, 20)
self.out=nn.Linear(20, 4)

def forward(self, x):
x=torch.tanh(self.fc1(x))
x=torch.tanh(self.fc2(x))
x=self.out(x)
return x

nx = net_x()
r = torch.tensor([1.0], requires_grad=True)
print('r', r)
y = nx(r)
print('y', y)
print('')
for i in range(y.shape[0]):
# prints the vector (dy_i/dr_0, dy_i/dr_1, ... dy_i/dr_n)
print(grad(y[i], r, retain_graph=True))

>>>
r tensor([1.], requires_grad=True)
y tensor([ 0.1698, -0.1871, -0.1313, -0.2747], grad_fn=<AddBackward0>)

(tensor([-0.0124]),)
(tensor([-0.0952]),)
(tensor([-0.0433]),)
(tensor([-0.0099]),)
我目前遇到的问题是我必须区分一个非常大的张量并像我目前所做的那样迭代它( for i in range(y.shape[0]) )需要永远。
我迭代的原因是从理解, grad只知道如何从标量张量传播梯度,其中 y不是。所以我需要计算关于 y 的每个坐标的梯度.
我知道 TensorFlow 能够区分张量,来自 here :
tf.gradients(
ys, xs, grad_ys=None, name='gradients', gate_gradients=False,
aggregation_method=None, stop_gradients=None,
unconnected_gradients=tf.UnconnectedGradients.NONE
)
"ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by the ys. The list must be the same length as ys.

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys and for x in xs."
并且希望有一种更有效的方法来区分 PyTorch 中的张量。
例如:
a = range(100)
b = range(100)
c = range(100)
d = range(100)
my_tensor = torch.tensor([a,b,c,d])

t = range(100)

#derivative = grad(my_tensor, t) --> not working

#Instead what I'm currently doing:
for i in range(len(t)):
a_grad = grad(a[i],t[i], retain_graph=True)
b_grad = grad(b[i],t[i], retain_graph=True)
#etc.
有人告诉我,如果我可以在向前传递而不是向后传递上运行 autograd,它可能会起作用,但是来自 here似乎它目前不是 PyTorch 的功能。
更新 1:
@jodag 提到我正在寻找的可能只是雅可比的对角线。我正在关注 link他附上了并尝试了更快的方法。虽然,这似乎不起作用,并给了我一个错误: RuntimeError: grad can be implicitly created only for scalar outputs .
代码:
nx = net_x()
x = torch.rand(10, requires_grad=True)
x = torch.reshape(x, (10,1))
x = x.unsqueeze(1).repeat(1, 4, 1)
y = nx(x)
dx = torch.diagonal(torch.autograd.grad(torch.diagonal(y, 0, -2, -1), x), 0, -2, -1)

最佳答案

我相信我使用 @ jodag 建议解决了它——简单地计算雅可比并取对角线。
考虑以下网络:

import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module):
def __init__(self):
super(net_x, self).__init__()
self.fc1=nn.Linear(1, 20)
self.fc2=nn.Linear(20, 20)
self.out=nn.Linear(20, 4) #a,b,c,d

def forward(self, x):
x=torch.tanh(self.fc1(x))
x=torch.tanh(self.fc2(x))
x=self.out(x)
return x

nx = net_x()

#input
t = torch.tensor([1.0, 2.0, 3.2], requires_grad = True) #input vector
t = torch.reshape(t, (3,1)) #reshape for batch
到目前为止,我的方法是从 grad 开始遍历输入。想要一个如上所述的标量值:
#method 1
for timestep in t:
y = nx(timestep)
print(grad(y[0],timestep, retain_graph=True)) #0 for the first vector (i.e "a"), 1 for the 2nd vector (i.e "b")

>>>
(tensor([-0.0142]),)
(tensor([-0.0517]),)
(tensor([-0.0634]),)
使用 Jacobian 的对角线似乎更有效,并给出相同的结果:
#method 2
dx = torch.autograd.functional.jacobian(lambda t_: nx(t_), t)
dx = torch.diagonal(torch.diagonal(dx, 0, -1), 0)[0] #first vector
#dx = torch.diagonal(torch.diagonal(dx, 1, -1), 0)[0] #2nd vector
#dx = torch.diagonal(torch.diagonal(dx, 2, -1), 0)[0] #3rd vector
#dx = torch.diagonal(torch.diagonal(dx, 3, -1), 0)[0] #4th vector
dx

>>>
tensor([-0.0142, -0.0517, -0.0634])

关于python - 如何通过张量有效地使用 PyTorch 的 autograd?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67320792/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com