gpt4 book ai didi

machine-learning - 卷积运算的意外结果

转载 作者:行者123 更新时间:2023-11-30 09:32:04 24 4
gpt4 key购买 nike

这是我编写的用于执行单个卷积并输出形状的代码。

使用 http://cs231n.github.io/convolutional-networks/ 中的公式计算输出大小:

You can convince yourself that the correct formula for calculating how many neurons “fit” is given by (W−F+2P)/S+1

计算输出大小的公式已实现如下

def output_size(w , f , stride , padding) : 
return (((w - f) + (2 * padding)) / stride) + 1

问题是 output_size 计算的大小为 2690.5,这与卷积结果 1350 不同:

%reset -f

import torch
import torch.nn.functional as F
import numpy as np
from PIL import Image
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from pylab import plt
plt.style.use('seaborn')
%matplotlib inline

width = 60
height = 30
kernel_size_param = 5
stride_param = 2
padding_param = 2

img = Image.new('RGB', (width, height), color = 'red')

in_channels = 3
out_channels = 3

class ConvNet(nn.Module):
def __init__(self):
super(ConvNet, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels,
out_channels,
kernel_size=kernel_size_param,
stride=stride_param,
padding=padding_param))

def forward(self, x):
out = self.layer1(x)

return out

# w : input volume size
# f : receptive field size of the Conv Layer neurons
# output_size computes spatial size of output volume - spatial dimensions are (width, height)
def output_size(w , f , stride , padding) :
return (((w - f) + (2 * padding)) / stride) + 1

w = width * height * in_channels
f = kernel_size_param * kernel_size_param

print('output size :' , output_size(w , f , stride_param , padding_param))

model = ConvNet()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=.001)

img_a = np.array(img)
img_pt = torch.tensor(img_a).float()
result = model(img_pt.view(3, width , height).unsqueeze_(0))
an = result.view(30 , 15 , out_channels).data.numpy()

# print(result.shape)
# print(an.shape)

# print(np.amin(an.flatten('F')))

print(30 * 15 * out_channels)

我是否正确实现了output_size?如何修改此模型,使 Conv2d 的结果与 output_size 的结果具有相同的形状?

最佳答案

问题是您的输入图像不是正方形,因此您应该对输入图像的宽度高度应用公式。而且您也不应该在公式中使用 nb_channels,因为我们明确定义了输出中需要的 channel 数。然后,您使用您的 f=kernel_size 而不是公式中所述的 f=kernel_size*kernel_size

w = width 
h = height
f = kernel_size_param
output_w = int(output_size(w , f , stride_param , padding_param))
output_h = int(output_size(h , f , stride_param , padding_param))
print("Output_size", [out_channels, output_w, output_h]) #--> [1, 3, 30 ,15]

然后输出大小:

print("Output size", result.shape)  #--> [1, 3, 30 ,15]  

公式来源:http://cs231n.github.io/convolutional-networks/

关于machine-learning - 卷积运算的意外结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53807071/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com