python - 我应该在神经网络中转置特征或权重吗？-6ren

python - 我应该在神经网络中转置特征或权重吗？

转载作者：行者123 更新时间：2023-12-04 15:22:29

我正在学习神经网络。

完整代码如下: https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20(Exercises).ipynb

当我转置特征时，我得到以下输出:

import torch
def activation(x):
    return 1/(1+torch.exp(-x))

### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 5 random normal variables
features = torch.randn((1, 5))
# True weights for our data, random normal variables again
weights = torch.randn_like(features)
# and a true bias term
bias = torch.randn((1, 1))

product = features.t() * weights + bias
output = activation(product.sum())

tensor(0.9897)

但是，如果我转置权重，我会得到不同的输出:

weights_prime = weights.view(5,1)
prod = torch.mm(features, weights_prime) + bias
y_hat = activation(prod.sum())

tensor(0.1595)

为什么会这样？

更新

我看了看解决方案: https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20(Solution).ipynb

我看到了这个:

y = activation((features * weights).sum() + bias)

为什么矩阵 features(1,5) 可以乘以另一个矩阵 weights(1,5) 而无需先转置权重？

更新2

看了几个帖子，才明白

matrixA * matrixB 不同于 torch.mm(matrixA,matrixB) 和 torch.matmul(matrixA,matrixB)。

谁能证实我的三个理解？

所以 * 表示元素乘法，而 torch.mm() 和 torch.matmul() 是矩阵乘法。
torch.mm() 和 torch.matmul() 的区别:mm() 专门用于二维矩阵，而 matmul() 可用于更复杂的情况。
在我上面链接中提到的这个 Udacity 编码练习的神经网络中，它需要逐元素乘法。

更新3

只是为了给有同样困惑的人带来视频截图:

这是视频链接:https://www.youtube.com/watch?time_continue=98&v=6Z7WntXays8&feature=emb_logo

最佳答案

查看https://pytorch.org/docs/master/generated/torch.nn.Linear.html

torch 中典型的线性(全连接)层使用形状为 (N,∗,in_features) 的输入特征和形状为 (out_features,in_features) 的权重来产生形状 (N,*,out_features) 的输出。这里 N 是批量大小，* 是任意数量的其他维度(可能没有)。

实现是:

output = input.matmul(weight.t())

因此，答案是根据惯例，您的两个公式都不正确；标准公式就是上面那个。

您可能会使用非标准形状，因为您是从头开始实现的；只要它是一致的，它就可以工作，但我不推荐它用于学习。不清楚您的代码中的 1 和 5 是什么，但大概您想要 5 个输入特征和一个输出特征，批处理大小也为 1。在这种情况下，标准形状应该是 input = torch.randn((1, 5)) for batch size=1 and in_features=5, and weights = torch.randn((5, 1)) 用于 in_features=5 和 out_features=1。

没有理由认为权重应该与特征具有相同的形状；因此 weights = torch.randn_like(features) 没有意义。

最后，针对您的实际问题:

“我应该在神经网络中转置特征还是权重？” - 在 torch 公约中，你应该转置权重，但首先使用具有特征的 matmul。其他框架可能有不同的约定；只要权重的 in_features 维度乘以输入的 num_features 维度，它就可以工作。

“为什么会这样？” - 这是两个完全不同的计算；没有理由认为它们会产生相同的结果。

“所以 * 表示元素乘法，而 torch.mm() 和 torch.matmul() 是矩阵乘法。” - 是的; mm 仅是矩阵-矩阵，matmul 是向量-矩阵或矩阵-矩阵，包括相同的批处理版本 - 检查文档以了解 matmul 可以做的一切(有点多)。

“torch.mm() 和 torch.matmul() 之间的区别:mm() 专门用于二维矩阵，而 matmul() 可用于更复杂的情况。” - 是的;最大的区别是 matmul 可以广播。当您特别打算这样做时使用它；使用 mm 防止意外广播。

“在我上面链接中提到的这个 Udacity 编码练习的神经网络中，它需要逐元素乘法。” - 我对此表示怀疑;这可能是 Udacity 代码中的一个错误。这段代码 weights = torch.randn_like(features) 看起来无论如何都是错误的；权重的维度与特征的维度具有不同的含义。

关于python - 我应该在神经网络中转置特征或权重吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63006388/